2022-05-18T04:12:11.2390591Z Requested labels: linux.8xlarge.nvidia.gpu 2022-05-18T04:12:11.2390679Z Job defined at: pytorch/pytorch/.github/workflows/_linux-test.yml@refs/heads/master 2022-05-18T04:12:11.2390706Z Waiting for a runner to pick up this job... 2022-05-18T04:13:22.1724143Z Job is about to start running on the runner: i-0f05d6101f258be9b (repository) 2022-05-18T04:13:27.3850623Z Current runner version: '2.291.1' 2022-05-18T04:13:27.3858232Z Runner name: 'i-0f05d6101f258be9b' 2022-05-18T04:13:27.3859132Z Runner group name: 'Default' 2022-05-18T04:13:27.3859873Z Machine name: 'ip-10-0-3-31' 2022-05-18T04:13:27.3862662Z ##[group]GITHUB_TOKEN Permissions 2022-05-18T04:13:27.3863677Z Actions: write 2022-05-18T04:13:27.3864140Z Checks: write 2022-05-18T04:13:27.3864516Z Contents: write 2022-05-18T04:13:27.3864975Z Deployments: write 2022-05-18T04:13:27.3865477Z Discussions: write 2022-05-18T04:13:27.3865863Z Issues: write 2022-05-18T04:13:27.3866316Z Metadata: read 2022-05-18T04:13:27.3866762Z Packages: write 2022-05-18T04:13:27.3867146Z Pages: write 2022-05-18T04:13:27.3867620Z PullRequests: write 2022-05-18T04:13:27.3868123Z RepositoryProjects: write 2022-05-18T04:13:27.3868566Z SecurityEvents: write 2022-05-18T04:13:27.3869019Z Statuses: write 2022-05-18T04:13:27.3869451Z ##[endgroup] 2022-05-18T04:13:27.3873716Z Secret source: Actions 2022-05-18T04:13:27.3874594Z Prepare workflow directory 2022-05-18T04:13:27.6739288Z Prepare all required actions 2022-05-18T04:13:27.6981564Z Getting action download info 2022-05-18T04:13:27.8965275Z Download action repository 'pytorch/pytorch@master' (SHA:7b8cf1f7366bff95e9954037a58a8bb0edaaebd3) 2022-05-18T04:13:30.9563899Z Download action repository 'nick-fields/retry@71062288b76e2b6214ebde0e673ce0de1755740a' (SHA:71062288b76e2b6214ebde0e673ce0de1755740a) 2022-05-18T04:13:31.1342321Z Download action repository 'seemethere/upload-artifact-s3@v4' (SHA:c1c31f57581a11fe6d4d052da6276adb2df71f1e) 2022-05-18T04:13:31.4775780Z Getting action download info 2022-05-18T04:13:31.6418003Z Download action repository 'malfet/checkout@silent-checkout' (SHA:f63e9e15406be6060f159846cd2e098f759c5246) 2022-05-18T04:13:31.8785806Z Getting action download info 2022-05-18T04:13:32.1603393Z ##[group]Run pytorch/pytorch/.github/actions/checkout-pytorch@master 2022-05-18T04:13:32.1603811Z with: 2022-05-18T04:13:32.1604051Z submodules: recursive 2022-05-18T04:13:32.1604316Z fetch-depth: 0 2022-05-18T04:13:32.1604569Z env: 2022-05-18T04:13:32.1604792Z IN_CI: 1 2022-05-18T04:13:32.1605019Z IS_GHA: 1 2022-05-18T04:13:32.1605274Z GIT_DEFAULT_BRANCH: master 2022-05-18T04:13:32.1605527Z ##[endgroup] 2022-05-18T04:13:32.1901693Z ##[group]Run echo "${GITHUB_WORKSPACE}" 2022-05-18T04:13:32.1902072Z echo "${GITHUB_WORKSPACE}" 2022-05-18T04:13:32.1902380Z if [ -z "${NO_SUDO}" ]; then 2022-05-18T04:13:32.1902673Z  sudo rm -rf "${GITHUB_WORKSPACE}" 2022-05-18T04:13:32.1902950Z else 2022-05-18T04:13:32.1903215Z  rm -rf "${GITHUB_WORKSPACE}" 2022-05-18T04:13:32.1903478Z fi 2022-05-18T04:13:32.1903723Z mkdir "${GITHUB_WORKSPACE}" 2022-05-18T04:13:32.1922162Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2022-05-18T04:13:32.1922498Z env: 2022-05-18T04:13:32.1922712Z IN_CI: 1 2022-05-18T04:13:32.1922940Z IS_GHA: 1 2022-05-18T04:13:32.1923205Z GIT_DEFAULT_BRANCH: master 2022-05-18T04:13:32.1923483Z NO_SUDO: 2022-05-18T04:13:32.1923722Z ##[endgroup] 2022-05-18T04:13:32.2146551Z /home/ec2-user/actions-runner/_work/pytorch/pytorch 2022-05-18T04:13:34.6976747Z ##[group]Run malfet/checkout@silent-checkout 2022-05-18T04:13:34.6977098Z with: 2022-05-18T04:13:34.6977382Z ref: 3b2375291aab7b48442f2e6fb1ef66cebc761e24 2022-05-18T04:13:34.6977655Z fetch-depth: 0 2022-05-18T04:13:34.6977923Z submodules: recursive 2022-05-18T04:13:34.6978191Z quiet-checkout: true 2022-05-18T04:13:34.6978455Z repository: pytorch/pytorch 2022-05-18T04:13:34.6978916Z token: *** 2022-05-18T04:13:34.6979172Z ssh-strict: true 2022-05-18T04:13:34.6979449Z persist-credentials: true 2022-05-18T04:13:34.6979700Z clean: true 2022-05-18T04:13:34.6979930Z lfs: false 2022-05-18T04:13:34.6980201Z set-safe-directory: true 2022-05-18T04:13:34.6980440Z env: 2022-05-18T04:13:34.6980660Z IN_CI: 1 2022-05-18T04:13:34.6980886Z IS_GHA: 1 2022-05-18T04:13:34.6981121Z GIT_DEFAULT_BRANCH: master 2022-05-18T04:13:34.6981381Z ##[endgroup] 2022-05-18T04:13:34.8513324Z Syncing repository: pytorch/pytorch 2022-05-18T04:13:34.8515194Z ##[group]Getting Git version info 2022-05-18T04:13:34.8515744Z Working directory is '/home/ec2-user/actions-runner/_work/pytorch/pytorch' 2022-05-18T04:13:34.8516335Z [command]/usr/bin/git version 2022-05-18T04:13:34.8516608Z git version 2.32.0 2022-05-18T04:13:34.8528686Z ##[endgroup] 2022-05-18T04:13:34.8550892Z Temporarily overriding HOME='/home/ec2-user/actions-runner/_work/_temp/e28716f8-64c0-4874-82d8-adb6e6b44085' before making global git config changes 2022-05-18T04:13:34.8551463Z Adding repository directory to the temporary git global config as a safe directory 2022-05-18T04:13:34.8559611Z [command]/usr/bin/git config --global --add safe.directory /home/ec2-user/actions-runner/_work/pytorch/pytorch 2022-05-18T04:13:34.8603990Z Deleting the contents of '/home/ec2-user/actions-runner/_work/pytorch/pytorch' 2022-05-18T04:13:34.8609774Z ##[group]Initializing the repository 2022-05-18T04:13:34.8616572Z [command]/usr/bin/git init /home/ec2-user/actions-runner/_work/pytorch/pytorch 2022-05-18T04:13:34.8650858Z hint: Using 'master' as the name for the initial branch. This default branch name 2022-05-18T04:13:34.8651393Z hint: is subject to change. To configure the initial branch name to use in all 2022-05-18T04:13:34.8652165Z hint: of your new repositories, which will suppress this warning, call: 2022-05-18T04:13:34.8652504Z hint: 2022-05-18T04:13:34.8652870Z hint: git config --global init.defaultBranch 2022-05-18T04:13:34.8653167Z hint: 2022-05-18T04:13:34.8653554Z hint: Names commonly chosen instead of 'master' are 'main', 'trunk' and 2022-05-18T04:13:34.8654040Z hint: 'development'. The just-created branch can be renamed via this command: 2022-05-18T04:13:34.8654368Z hint: 2022-05-18T04:13:34.8654827Z hint: git branch -m 2022-05-18T04:13:34.8655330Z Initialized empty Git repository in /home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/ 2022-05-18T04:13:34.8666766Z [command]/usr/bin/git remote add origin https://github.com/pytorch/pytorch 2022-05-18T04:13:34.8703070Z ##[endgroup] 2022-05-18T04:13:34.8703564Z ##[group]Disabling automatic garbage collection 2022-05-18T04:13:34.8709092Z [command]/usr/bin/git config --local gc.auto 0 2022-05-18T04:13:34.8741418Z ##[endgroup] 2022-05-18T04:13:34.8741865Z ##[group]Setting up auth 2022-05-18T04:13:34.8752174Z [command]/usr/bin/git config --local --name-only --get-regexp core\.sshCommand 2022-05-18T04:13:34.8790666Z [command]/usr/bin/git submodule foreach --recursive git config --local --name-only --get-regexp 'core\.sshCommand' && git config --local --unset-all 'core.sshCommand' || : 2022-05-18T04:13:34.9101849Z [command]/usr/bin/git config --local --name-only --get-regexp http\.https\:\/\/github\.com\/\.extraheader 2022-05-18T04:13:34.9137534Z [command]/usr/bin/git submodule foreach --recursive git config --local --name-only --get-regexp 'http\.https\:\/\/github\.com\/\.extraheader' && git config --local --unset-all 'http.https://github.com/.extraheader' || : 2022-05-18T04:13:34.9448445Z [command]/usr/bin/git config --local http.https://github.com/.extraheader AUTHORIZATION: basic *** 2022-05-18T04:13:34.9503258Z ##[endgroup] 2022-05-18T04:13:34.9503739Z ##[group]Fetching the repository 2022-05-18T04:13:34.9512708Z [command]/usr/bin/git -c protocol.version=2 fetch --prune --quiet --no-recurse-submodules origin +refs/heads/*:refs/remotes/origin/* +refs/tags/*:refs/tags/* 2022-05-18T04:14:17.1776708Z [command]/usr/bin/git rev-parse --verify --quiet 3b2375291aab7b48442f2e6fb1ef66cebc761e24^{object} 2022-05-18T04:14:17.1807273Z 3b2375291aab7b48442f2e6fb1ef66cebc761e24 2022-05-18T04:14:17.1816845Z ##[endgroup] 2022-05-18T04:14:17.1817823Z ##[group]Determining the checkout info 2022-05-18T04:14:17.1818791Z ##[endgroup] 2022-05-18T04:14:17.1819679Z ##[group]Checking out the ref 2022-05-18T04:14:17.1825038Z [command]/usr/bin/git checkout --quiet --force 3b2375291aab7b48442f2e6fb1ef66cebc761e24 2022-05-18T04:14:18.7645390Z ##[endgroup] 2022-05-18T04:14:18.7646224Z ##[group]Setting up auth for fetching submodules 2022-05-18T04:14:18.7653799Z [command]/usr/bin/git config --global http.https://github.com/.extraheader AUTHORIZATION: basic *** 2022-05-18T04:14:18.7710594Z [command]/usr/bin/git config --global --unset-all url.https://github.com/.insteadOf 2022-05-18T04:14:18.7745156Z [command]/usr/bin/git config --global --add url.https://github.com/.insteadOf git@github.com: 2022-05-18T04:14:18.7779606Z [command]/usr/bin/git config --global --add url.https://github.com/.insteadOf org-21003710@github.com: 2022-05-18T04:14:18.7812036Z ##[endgroup] 2022-05-18T04:14:18.7812547Z ##[group]Fetching submodules 2022-05-18T04:14:18.7819018Z [command]/usr/bin/git submodule sync --recursive 2022-05-18T04:14:18.8149306Z [command]/usr/bin/git -c protocol.version=2 submodule update --init --force --recursive 2022-05-18T04:14:18.8461515Z Submodule 'android/libs/fbjni' (https://github.com/facebookincubator/fbjni.git) registered for path 'android/libs/fbjni' 2022-05-18T04:14:18.8463910Z Submodule 'third_party/NNPACK_deps/FP16' (https://github.com/Maratyszcza/FP16.git) registered for path 'third_party/FP16' 2022-05-18T04:14:18.8466853Z Submodule 'third_party/NNPACK_deps/FXdiv' (https://github.com/Maratyszcza/FXdiv.git) registered for path 'third_party/FXdiv' 2022-05-18T04:14:18.8470024Z Submodule 'third_party/NNPACK' (https://github.com/Maratyszcza/NNPACK.git) registered for path 'third_party/NNPACK' 2022-05-18T04:14:18.8473310Z Submodule 'third_party/QNNPACK' (https://github.com/pytorch/QNNPACK) registered for path 'third_party/QNNPACK' 2022-05-18T04:14:18.8476749Z Submodule 'third_party/XNNPACK' (https://github.com/google/XNNPACK.git) registered for path 'third_party/XNNPACK' 2022-05-18T04:14:18.8480506Z Submodule 'third_party/benchmark' (https://github.com/google/benchmark.git) registered for path 'third_party/benchmark' 2022-05-18T04:14:18.8484355Z Submodule 'third_party/cpuinfo' (https://github.com/pytorch/cpuinfo.git) registered for path 'third_party/cpuinfo' 2022-05-18T04:14:18.8487879Z Submodule 'third_party/cub' (https://github.com/NVlabs/cub.git) registered for path 'third_party/cub' 2022-05-18T04:14:18.8492656Z Submodule 'third_party/cudnn_frontend' (https://github.com/NVIDIA/cudnn-frontend.git) registered for path 'third_party/cudnn_frontend' 2022-05-18T04:14:18.8496570Z Submodule 'third_party/eigen' (https://gitlab.com/libeigen/eigen.git) registered for path 'third_party/eigen' 2022-05-18T04:14:18.8500862Z Submodule 'third_party/fbgemm' (https://github.com/pytorch/fbgemm) registered for path 'third_party/fbgemm' 2022-05-18T04:14:18.8505213Z Submodule 'third_party/flatbuffers' (https://github.com/google/flatbuffers.git) registered for path 'third_party/flatbuffers' 2022-05-18T04:14:18.8509786Z Submodule 'third_party/fmt' (https://github.com/fmtlib/fmt.git) registered for path 'third_party/fmt' 2022-05-18T04:14:18.8514321Z Submodule 'third_party/foxi' (https://github.com/houseroad/foxi.git) registered for path 'third_party/foxi' 2022-05-18T04:14:18.8519084Z Submodule 'third_party/gemmlowp/gemmlowp' (https://github.com/google/gemmlowp.git) registered for path 'third_party/gemmlowp/gemmlowp' 2022-05-18T04:14:18.8523869Z Submodule 'third_party/gloo' (https://github.com/facebookincubator/gloo) registered for path 'third_party/gloo' 2022-05-18T04:14:18.8528902Z Submodule 'third_party/googletest' (https://github.com/google/googletest.git) registered for path 'third_party/googletest' 2022-05-18T04:14:18.8534740Z Submodule 'third_party/ideep' (https://github.com/intel/ideep) registered for path 'third_party/ideep' 2022-05-18T04:14:18.8539889Z Submodule 'third_party/ios-cmake' (https://github.com/Yangqing/ios-cmake.git) registered for path 'third_party/ios-cmake' 2022-05-18T04:14:18.8545192Z Submodule 'third_party/kineto' (https://github.com/pytorch/kineto) registered for path 'third_party/kineto' 2022-05-18T04:14:18.8550757Z Submodule 'third_party/nccl/nccl' (https://github.com/NVIDIA/nccl) registered for path 'third_party/nccl/nccl' 2022-05-18T04:14:18.8556430Z Submodule 'third_party/neon2sse' (https://github.com/intel/ARM_NEON_2_x86_SSE.git) registered for path 'third_party/neon2sse' 2022-05-18T04:14:18.8562277Z Submodule 'third_party/onnx' (https://github.com/onnx/onnx.git) registered for path 'third_party/onnx' 2022-05-18T04:14:18.8568150Z Submodule 'third_party/onnx-tensorrt' (https://github.com/onnx/onnx-tensorrt) registered for path 'third_party/onnx-tensorrt' 2022-05-18T04:14:18.8574827Z Submodule 'third_party/pocketfft' (https://github.com/mreineck/pocketfft) registered for path 'third_party/pocketfft' 2022-05-18T04:14:18.8581046Z Submodule 'third_party/protobuf' (https://github.com/protocolbuffers/protobuf.git) registered for path 'third_party/protobuf' 2022-05-18T04:14:18.8587359Z Submodule 'third_party/NNPACK_deps/psimd' (https://github.com/Maratyszcza/psimd.git) registered for path 'third_party/psimd' 2022-05-18T04:14:18.8594020Z Submodule 'third_party/NNPACK_deps/pthreadpool' (https://github.com/Maratyszcza/pthreadpool.git) registered for path 'third_party/pthreadpool' 2022-05-18T04:14:18.8600398Z Submodule 'third_party/pybind11' (https://github.com/pybind/pybind11.git) registered for path 'third_party/pybind11' 2022-05-18T04:14:18.8607131Z Submodule 'third_party/python-enum' (https://github.com/PeachPy/enum34.git) registered for path 'third_party/python-enum' 2022-05-18T04:14:18.8614709Z Submodule 'third_party/python-peachpy' (https://github.com/Maratyszcza/PeachPy.git) registered for path 'third_party/python-peachpy' 2022-05-18T04:14:18.8621549Z Submodule 'third_party/python-six' (https://github.com/benjaminp/six.git) registered for path 'third_party/python-six' 2022-05-18T04:14:18.8628617Z Submodule 'third_party/sleef' (https://github.com/shibatch/sleef) registered for path 'third_party/sleef' 2022-05-18T04:14:18.8635906Z Submodule 'third_party/tbb' (https://github.com/01org/tbb) registered for path 'third_party/tbb' 2022-05-18T04:14:18.8643201Z Submodule 'third_party/tensorpipe' (https://github.com/pytorch/tensorpipe.git) registered for path 'third_party/tensorpipe' 2022-05-18T04:14:18.8651311Z Submodule 'third_party/zstd' (https://github.com/facebook/zstd.git) registered for path 'third_party/zstd' 2022-05-18T04:14:18.8714552Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/android/libs/fbjni'... 2022-05-18T04:14:19.1040095Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/FP16'... 2022-05-18T04:14:19.2854178Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/FXdiv'... 2022-05-18T04:14:19.4625281Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/NNPACK'... 2022-05-18T04:14:19.8051881Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/QNNPACK'... 2022-05-18T04:14:20.0306946Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/XNNPACK'... 2022-05-18T04:14:23.4673624Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/benchmark'... 2022-05-18T04:14:23.7984921Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/cpuinfo'... 2022-05-18T04:14:24.2533611Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/cub'... 2022-05-18T04:14:25.3499650Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/cudnn_frontend'... 2022-05-18T04:14:26.5138186Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/eigen'... 2022-05-18T04:14:31.2324590Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm'... 2022-05-18T04:14:31.7254840Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/flatbuffers'... 2022-05-18T04:14:32.6622037Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fmt'... 2022-05-18T04:14:33.5670701Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/foxi'... 2022-05-18T04:14:33.7573770Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/gemmlowp/gemmlowp'... 2022-05-18T04:14:34.1783267Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/gloo'... 2022-05-18T04:14:34.4444399Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/googletest'... 2022-05-18T04:14:35.2556106Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/ideep'... 2022-05-18T04:14:35.5917375Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/ios-cmake'... 2022-05-18T04:14:35.7834018Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto'... 2022-05-18T04:14:37.2429485Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/nccl/nccl'... 2022-05-18T04:14:37.5688467Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/neon2sse'... 2022-05-18T04:14:37.9257585Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/onnx'... 2022-05-18T04:14:39.0412552Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/onnx-tensorrt'... 2022-05-18T04:14:39.4141788Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/pocketfft'... 2022-05-18T04:14:39.6257010Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/protobuf'... 2022-05-18T04:14:43.7635800Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/psimd'... 2022-05-18T04:14:43.9550606Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/pthreadpool'... 2022-05-18T04:14:44.1599080Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/pybind11'... 2022-05-18T04:14:44.8284693Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/python-enum'... 2022-05-18T04:14:45.0422181Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/python-peachpy'... 2022-05-18T04:14:45.4262420Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/python-six'... 2022-05-18T04:14:45.6772831Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/sleef'... 2022-05-18T04:14:46.1503422Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tbb'... 2022-05-18T04:14:47.8736871Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe'... 2022-05-18T04:14:48.3005803Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/zstd'... 2022-05-18T04:14:50.9438614Z Submodule path 'android/libs/fbjni': checked out '7e1e1fe3858c63c251c637ae41a20de425dde96f' 2022-05-18T04:14:50.9845471Z Submodule path 'third_party/FP16': checked out '4dfe081cf6bcd15db339cf2680b9281b8451eeb3' 2022-05-18T04:14:51.0206978Z Submodule path 'third_party/FXdiv': checked out 'b408327ac2a15ec3e43352421954f5b1967701d1' 2022-05-18T04:14:51.0750414Z Submodule path 'third_party/NNPACK': checked out 'c07e3a0400713d546e0dea2d5466dd22ea389c73' 2022-05-18T04:14:51.1997349Z Submodule path 'third_party/QNNPACK': checked out '7d2a4e9931a82adc3814275b6219a03e24e36b4c' 2022-05-18T04:14:51.9643360Z Submodule path 'third_party/XNNPACK': checked out 'ae108ef49aa5623b896fc93d4298c49d1750d9ba' 2022-05-18T04:14:52.0174247Z Submodule path 'third_party/benchmark': checked out 'e991355c02b93fe17713efe04cbc2e278e00fdbd' 2022-05-18T04:14:52.1647588Z Submodule path 'third_party/cpuinfo': checked out '5916273f79a21551890fd3d56fc5375a78d1598d' 2022-05-18T04:14:52.2325872Z Submodule path 'third_party/cub': checked out 'd106ddb991a56c3df1b6d51b2409e36ba8181ce4' 2022-05-18T04:14:52.6090589Z Submodule path 'third_party/cudnn_frontend': checked out '43709ab96c47e26eebcdac72f93f946d44ceffa8' 2022-05-18T04:14:52.9289918Z Submodule path 'third_party/eigen': checked out '3147391d946bb4b6c68edd901f2add6ac1f31f8c' 2022-05-18T04:14:53.0107749Z Submodule path 'third_party/fbgemm': checked out '2e9be65810107a9595da717f95d21924b73be833' 2022-05-18T04:14:53.0159103Z Submodule 'third_party/asmjit' (https://github.com/asmjit/asmjit.git) registered for path 'third_party/fbgemm/third_party/asmjit' 2022-05-18T04:14:53.0162019Z Submodule 'third_party/cpuinfo' (https://github.com/pytorch/cpuinfo) registered for path 'third_party/fbgemm/third_party/cpuinfo' 2022-05-18T04:14:53.0165124Z Submodule 'third_party/googletest' (https://github.com/google/googletest) registered for path 'third_party/fbgemm/third_party/googletest' 2022-05-18T04:14:53.0209121Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm/third_party/asmjit'... 2022-05-18T04:14:53.6572541Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm/third_party/cpuinfo'... 2022-05-18T04:14:54.1162522Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm/third_party/googletest'... 2022-05-18T04:14:54.9631931Z Submodule path 'third_party/fbgemm/third_party/asmjit': checked out '8b35b4cffb62ecb58a903bf91cb7537d7a672211' 2022-05-18T04:14:55.1133788Z Submodule path 'third_party/fbgemm/third_party/cpuinfo': checked out 'ed8b86a253800bafdb7b25c5c399f91bff9cb1f3' 2022-05-18T04:14:55.2091749Z Submodule path 'third_party/fbgemm/third_party/googletest': checked out 'cbf019de22c8dd37b2108da35b2748fd702d1796' 2022-05-18T04:14:55.3417407Z Submodule path 'third_party/flatbuffers': checked out 'd0cede9c90c5257537c293517a21376408b549fa' 2022-05-18T04:14:55.4097181Z Submodule path 'third_party/fmt': checked out 'cd4af11efc9c622896a3e4cb599fa28668ca3d05' 2022-05-18T04:14:55.4458407Z Submodule path 'third_party/foxi': checked out 'c278588e34e535f0bb8f00df3880d26928038cad' 2022-05-18T04:14:55.5186880Z Submodule path 'third_party/gemmlowp/gemmlowp': checked out '3fb5c176c17c765a3492cd2f0321b0dab712f350' 2022-05-18T04:14:55.5743236Z Submodule path 'third_party/gloo': checked out 'c22a5cfba94edf8ea4f53a174d38aa0c629d070f' 2022-05-18T04:14:55.6557689Z Submodule path 'third_party/googletest': checked out 'e2239ee6043f73722e7aa812a459f54a28552929' 2022-05-18T04:14:55.6938698Z Submodule path 'third_party/ideep': checked out '02b17c5748c9349dcc586c359af800c684d9b1ab' 2022-05-18T04:14:55.6989029Z Submodule 'mkl-dnn' (https://github.com/intel/mkl-dnn.git) registered for path 'third_party/ideep/mkl-dnn' 2022-05-18T04:14:55.7033059Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/ideep/mkl-dnn'... 2022-05-18T04:15:00.7099801Z Submodule path 'third_party/ideep/mkl-dnn': checked out '888a87a954e4fddb4d81fd10858eb834f2441b46' 2022-05-18T04:15:00.7163928Z Submodule 'third_party/oneDNN' (https://github.com/oneapi-src/oneDNN.git) registered for path 'third_party/ideep/mkl-dnn/third_party/oneDNN' 2022-05-18T04:15:00.7213435Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/ideep/mkl-dnn/third_party/oneDNN'... 2022-05-18T04:15:05.8149043Z Submodule path 'third_party/ideep/mkl-dnn/third_party/oneDNN': checked out '52b5f107dd9cf10910aaa19cb47f3abf9b349815' 2022-05-18T04:15:05.8554695Z Submodule path 'third_party/ios-cmake': checked out '8abaed637d56f1337d6e1d2c4026e25c1eade724' 2022-05-18T04:15:05.9940290Z Submodule path 'third_party/kineto': checked out 'b2b48c00c6e5bd8e807e2231adb229db6a1d1c22' 2022-05-18T04:15:05.9993099Z Submodule 'libkineto/third_party/fmt' (https://github.com/fmtlib/fmt.git) registered for path 'third_party/kineto/libkineto/third_party/fmt' 2022-05-18T04:15:05.9996112Z Submodule 'libkineto/third_party/googletest' (https://github.com/google/googletest.git) registered for path 'third_party/kineto/libkineto/third_party/googletest' 2022-05-18T04:15:06.0042344Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/fmt'... 2022-05-18T04:15:06.9055923Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto/libkineto/third_party/googletest'... 2022-05-18T04:15:07.7865268Z Submodule path 'third_party/kineto/libkineto/third_party/fmt': checked out '2591ab91c3898c9f6544fff04660276537d32ffd' 2022-05-18T04:15:07.8757433Z Submodule path 'third_party/kineto/libkineto/third_party/googletest': checked out '7aca84427f224eeed3144123d5230d5871e93347' 2022-05-18T04:15:07.9263839Z Submodule path 'third_party/nccl/nccl': checked out '7e515921295adaab72adf56ea71a0fafb0ecb5f3' 2022-05-18T04:15:07.9681223Z Submodule path 'third_party/neon2sse': checked out '97a126f08ce318023be604d03f88bf0820a9464a' 2022-05-18T04:15:08.2946194Z Submodule path 'third_party/onnx': checked out '96046b8ccfb8e6fa82f6b2b34b3d56add2e8849c' 2022-05-18T04:15:08.3011583Z Submodule 'third_party/benchmark' (https://github.com/google/benchmark.git) registered for path 'third_party/onnx/third_party/benchmark' 2022-05-18T04:15:08.3014891Z Submodule 'third_party/pybind11' (https://github.com/pybind/pybind11.git) registered for path 'third_party/onnx/third_party/pybind11' 2022-05-18T04:15:08.3074332Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/onnx/third_party/benchmark'... 2022-05-18T04:15:08.6440153Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/onnx/third_party/pybind11'... 2022-05-18T04:15:09.3321929Z Submodule path 'third_party/onnx/third_party/benchmark': checked out 'e776aa0275e293707b6a0901e0e8d8a8a3679508' 2022-05-18T04:15:09.3960945Z Submodule path 'third_party/onnx/third_party/pybind11': checked out '59a2ac2745d8a57ac94c6accced73620d59fb844' 2022-05-18T04:15:09.4402996Z Submodule path 'third_party/onnx-tensorrt': checked out 'c153211418a7c57ce071d9ce2a41f8d1c85a878f' 2022-05-18T04:15:09.4455351Z Submodule 'third_party/onnx' (https://github.com/onnx/onnx.git) registered for path 'third_party/onnx-tensorrt/third_party/onnx' 2022-05-18T04:15:09.4498224Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/onnx-tensorrt/third_party/onnx'... 2022-05-18T04:15:10.7931003Z Submodule path 'third_party/onnx-tensorrt/third_party/onnx': checked out '765f5ee823a67a866f4bd28a9860e81f3c811ce8' 2022-05-18T04:15:10.7996400Z Submodule 'third_party/benchmark' (https://github.com/google/benchmark.git) registered for path 'third_party/onnx-tensorrt/third_party/onnx/third_party/benchmark' 2022-05-18T04:15:10.7999052Z Submodule 'third_party/pybind11' (https://github.com/pybind/pybind11.git) registered for path 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11' 2022-05-18T04:15:10.8052392Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/onnx-tensorrt/third_party/onnx/third_party/benchmark'... 2022-05-18T04:15:11.1353148Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11'... 2022-05-18T04:15:11.8284565Z Submodule path 'third_party/onnx-tensorrt/third_party/onnx/third_party/benchmark': checked out 'e776aa0275e293707b6a0901e0e8d8a8a3679508' 2022-05-18T04:15:11.9309886Z Submodule path 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11': checked out 'a1041190c8b8ff0cd9e2f0752248ad5e3789ea0c' 2022-05-18T04:15:11.9367825Z Submodule 'tools/clang' (https://github.com/wjakob/clang-cindex-python3) registered for path 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11/tools/clang' 2022-05-18T04:15:11.9412938Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11/tools/clang'... 2022-05-18T04:15:12.1653351Z Submodule path 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11/tools/clang': checked out '6a00cbc4a9b8e68b71caf7f774b3f9c753ae84d5' 2022-05-18T04:15:12.2038279Z Submodule path 'third_party/pocketfft': checked out 'ea778e37710c07723435b1be58235996d1d43a5a' 2022-05-18T04:15:12.5505193Z Submodule path 'third_party/protobuf': checked out 'd1eca4e4b421cd2997495c4b4e65cea6be4e9b8a' 2022-05-18T04:15:12.5554742Z Submodule 'third_party/benchmark' (https://github.com/google/benchmark.git) registered for path 'third_party/protobuf/third_party/benchmark' 2022-05-18T04:15:12.5557511Z Submodule 'third_party/googletest' (https://github.com/google/googletest.git) registered for path 'third_party/protobuf/third_party/googletest' 2022-05-18T04:15:12.5608101Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/protobuf/third_party/benchmark'... 2022-05-18T04:15:12.9259596Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/protobuf/third_party/googletest'... 2022-05-18T04:15:13.7525593Z Submodule path 'third_party/protobuf/third_party/benchmark': checked out '5b7683f49e1e9223cf9927b24f6fd3d6bd82e3f8' 2022-05-18T04:15:13.8602816Z Submodule path 'third_party/protobuf/third_party/googletest': checked out '5ec7f0c4a113e2f18ac2c6cc7df51ad6afc24081' 2022-05-18T04:15:13.8980031Z Submodule path 'third_party/psimd': checked out '072586a71b55b7f8c584153d223e95687148a900' 2022-05-18T04:15:13.9366235Z Submodule path 'third_party/pthreadpool': checked out 'a134dd5d4cee80cce15db81a72e7f929d71dd413' 2022-05-18T04:15:13.9987260Z Submodule path 'third_party/pybind11': checked out '8de7772cc72daca8e947b79b83fea46214931604' 2022-05-18T04:15:14.0350382Z Submodule path 'third_party/python-enum': checked out '4cfedc426c4e2fc52e3f5c2b4297e15ed8d6b8c7' 2022-05-18T04:15:14.0955471Z Submodule path 'third_party/python-peachpy': checked out '07d8fde8ac45d7705129475c0f94ed8925b93473' 2022-05-18T04:15:14.1323570Z Submodule path 'third_party/python-six': checked out '15e31431af97e5e64b80af0a3f598d382bcdd49a' 2022-05-18T04:15:14.2119799Z Submodule path 'third_party/sleef': checked out 'e0a003ee838b75d11763aa9c3ef17bf71a725bff' 2022-05-18T04:15:14.3754349Z Submodule path 'third_party/tbb': checked out 'a51a90bc609bb73db8ea13841b5cf7aa4344d4a9' 2022-05-18T04:15:14.4340235Z Submodule path 'third_party/tensorpipe': checked out '52791a2fd214b2a9dc5759d36725909c1daa7f2e' 2022-05-18T04:15:14.4391153Z Submodule 'third_party/googletest' (https://github.com/google/googletest.git) registered for path 'third_party/tensorpipe/third_party/googletest' 2022-05-18T04:15:14.4394268Z Submodule 'third_party/libnop' (https://github.com/google/libnop.git) registered for path 'third_party/tensorpipe/third_party/libnop' 2022-05-18T04:15:14.4397294Z Submodule 'third_party/libuv' (https://github.com/libuv/libuv.git) registered for path 'third_party/tensorpipe/third_party/libuv' 2022-05-18T04:15:14.4400570Z Submodule 'third_party/pybind11' (https://github.com/pybind/pybind11.git) registered for path 'third_party/tensorpipe/third_party/pybind11' 2022-05-18T04:15:14.4445789Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe/third_party/googletest'... 2022-05-18T04:15:15.2407299Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe/third_party/libnop'... 2022-05-18T04:15:15.4768732Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe/third_party/libuv'... 2022-05-18T04:15:16.6354332Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe/third_party/pybind11'... 2022-05-18T04:15:17.3434921Z Submodule path 'third_party/tensorpipe/third_party/googletest': checked out 'aee0f9d9b5b87796ee8a0ab26b7587ec30e8858e' 2022-05-18T04:15:17.3869221Z Submodule path 'third_party/tensorpipe/third_party/libnop': checked out '910b55815be16109f04f4180e9adee14fb4ce281' 2022-05-18T04:15:17.4924127Z Submodule path 'third_party/tensorpipe/third_party/libuv': checked out '1dff88e5161cba5c59276d2070d2e304e4dcb242' 2022-05-18T04:15:17.5514392Z Submodule path 'third_party/tensorpipe/third_party/pybind11': checked out 'a23996fce38ff6ccfbcdc09f1e63f2c4be5ea2ef' 2022-05-18T04:15:17.5572036Z Submodule 'tools/clang' (https://github.com/wjakob/clang-cindex-python3) registered for path 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2022-05-18T04:15:17.5615359Z Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe/third_party/pybind11/tools/clang'... 2022-05-18T04:15:17.7985962Z Submodule path 'third_party/tensorpipe/third_party/pybind11/tools/clang': checked out '6a00cbc4a9b8e68b71caf7f774b3f9c753ae84d5' 2022-05-18T04:15:17.9821244Z Submodule path 'third_party/zstd': checked out 'aec56a52fbab207fc639a1937d1e708a282edca8' 2022-05-18T04:15:17.9912042Z [command]/usr/bin/git submodule foreach --recursive git config --local gc.auto 0 2022-05-18T04:15:18.0239162Z Entering 'android/libs/fbjni' 2022-05-18T04:15:18.0282226Z Entering 'third_party/FP16' 2022-05-18T04:15:18.0325016Z Entering 'third_party/FXdiv' 2022-05-18T04:15:18.0367009Z Entering 'third_party/NNPACK' 2022-05-18T04:15:18.0409150Z Entering 'third_party/QNNPACK' 2022-05-18T04:15:18.0452850Z Entering 'third_party/XNNPACK' 2022-05-18T04:15:18.0506211Z Entering 'third_party/benchmark' 2022-05-18T04:15:18.0548462Z Entering 'third_party/cpuinfo' 2022-05-18T04:15:18.0591449Z Entering 'third_party/cub' 2022-05-18T04:15:18.0633985Z Entering 'third_party/cudnn_frontend' 2022-05-18T04:15:18.0681711Z Entering 'third_party/eigen' 2022-05-18T04:15:18.0726394Z Entering 'third_party/fbgemm' 2022-05-18T04:15:18.0768349Z Entering 'third_party/fbgemm/third_party/asmjit' 2022-05-18T04:15:18.0809656Z Entering 'third_party/fbgemm/third_party/cpuinfo' 2022-05-18T04:15:18.0851729Z Entering 'third_party/fbgemm/third_party/googletest' 2022-05-18T04:15:18.0895179Z Entering 'third_party/flatbuffers' 2022-05-18T04:15:18.0940001Z Entering 'third_party/fmt' 2022-05-18T04:15:18.0982755Z Entering 'third_party/foxi' 2022-05-18T04:15:18.1024934Z Entering 'third_party/gemmlowp/gemmlowp' 2022-05-18T04:15:18.1065758Z Entering 'third_party/gloo' 2022-05-18T04:15:18.1108268Z Entering 'third_party/googletest' 2022-05-18T04:15:18.1149877Z Entering 'third_party/ideep' 2022-05-18T04:15:18.1190597Z Entering 'third_party/ideep/mkl-dnn' 2022-05-18T04:15:18.1233683Z Entering 'third_party/ideep/mkl-dnn/third_party/oneDNN' 2022-05-18T04:15:18.1280784Z Entering 'third_party/ios-cmake' 2022-05-18T04:15:18.1322970Z Entering 'third_party/kineto' 2022-05-18T04:15:18.1364076Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2022-05-18T04:15:18.1405191Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2022-05-18T04:15:18.1449441Z Entering 'third_party/nccl/nccl' 2022-05-18T04:15:18.1491842Z Entering 'third_party/neon2sse' 2022-05-18T04:15:18.1533676Z Entering 'third_party/onnx' 2022-05-18T04:15:18.1588574Z Entering 'third_party/onnx/third_party/benchmark' 2022-05-18T04:15:18.1630959Z Entering 'third_party/onnx/third_party/pybind11' 2022-05-18T04:15:18.1674371Z Entering 'third_party/onnx-tensorrt' 2022-05-18T04:15:18.1716835Z Entering 'third_party/onnx-tensorrt/third_party/onnx' 2022-05-18T04:15:18.1763904Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/benchmark' 2022-05-18T04:15:18.1806430Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11' 2022-05-18T04:15:18.1847677Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11/tools/clang' 2022-05-18T04:15:18.1893909Z Entering 'third_party/pocketfft' 2022-05-18T04:15:18.1935843Z Entering 'third_party/protobuf' 2022-05-18T04:15:18.1980893Z Entering 'third_party/protobuf/third_party/benchmark' 2022-05-18T04:15:18.2021111Z Entering 'third_party/protobuf/third_party/googletest' 2022-05-18T04:15:18.2063351Z Entering 'third_party/psimd' 2022-05-18T04:15:18.2104768Z Entering 'third_party/pthreadpool' 2022-05-18T04:15:18.2145008Z Entering 'third_party/pybind11' 2022-05-18T04:15:18.2186289Z Entering 'third_party/python-enum' 2022-05-18T04:15:18.2227868Z Entering 'third_party/python-peachpy' 2022-05-18T04:15:18.2269921Z Entering 'third_party/python-six' 2022-05-18T04:15:18.2311065Z Entering 'third_party/sleef' 2022-05-18T04:15:18.2352751Z Entering 'third_party/tbb' 2022-05-18T04:15:18.2395813Z Entering 'third_party/tensorpipe' 2022-05-18T04:15:18.2437564Z Entering 'third_party/tensorpipe/third_party/googletest' 2022-05-18T04:15:18.2478291Z Entering 'third_party/tensorpipe/third_party/libnop' 2022-05-18T04:15:18.2518850Z Entering 'third_party/tensorpipe/third_party/libuv' 2022-05-18T04:15:18.2559799Z Entering 'third_party/tensorpipe/third_party/pybind11' 2022-05-18T04:15:18.2599697Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2022-05-18T04:15:18.2643343Z Entering 'third_party/zstd' 2022-05-18T04:15:18.2695965Z ##[endgroup] 2022-05-18T04:15:18.2698819Z ##[group]Persisting credentials for submodules 2022-05-18T04:15:18.2705399Z [command]/usr/bin/git submodule foreach --recursive git config --local --name-only --get-regexp 'url\.https\:\/\/github\.com\/\.insteadOf' && git config --local --unset-all 'url.https://github.com/.insteadOf' || : 2022-05-18T04:15:18.3022980Z Entering 'android/libs/fbjni' 2022-05-18T04:15:18.3063823Z Entering 'third_party/FP16' 2022-05-18T04:15:18.3104915Z Entering 'third_party/FXdiv' 2022-05-18T04:15:18.3146334Z Entering 'third_party/NNPACK' 2022-05-18T04:15:18.3188350Z Entering 'third_party/QNNPACK' 2022-05-18T04:15:18.3229164Z Entering 'third_party/XNNPACK' 2022-05-18T04:15:18.3281273Z Entering 'third_party/benchmark' 2022-05-18T04:15:18.3322315Z Entering 'third_party/cpuinfo' 2022-05-18T04:15:18.3364801Z Entering 'third_party/cub' 2022-05-18T04:15:18.3405445Z Entering 'third_party/cudnn_frontend' 2022-05-18T04:15:18.3452157Z Entering 'third_party/eigen' 2022-05-18T04:15:18.3495469Z Entering 'third_party/fbgemm' 2022-05-18T04:15:18.3535958Z Entering 'third_party/fbgemm/third_party/asmjit' 2022-05-18T04:15:18.3576034Z Entering 'third_party/fbgemm/third_party/cpuinfo' 2022-05-18T04:15:18.3616862Z Entering 'third_party/fbgemm/third_party/googletest' 2022-05-18T04:15:18.3658018Z Entering 'third_party/flatbuffers' 2022-05-18T04:15:18.3701031Z Entering 'third_party/fmt' 2022-05-18T04:15:18.3741160Z Entering 'third_party/foxi' 2022-05-18T04:15:18.3781607Z Entering 'third_party/gemmlowp/gemmlowp' 2022-05-18T04:15:18.3822469Z Entering 'third_party/gloo' 2022-05-18T04:15:18.3863871Z Entering 'third_party/googletest' 2022-05-18T04:15:18.3903841Z Entering 'third_party/ideep' 2022-05-18T04:15:18.3943530Z Entering 'third_party/ideep/mkl-dnn' 2022-05-18T04:15:18.3984965Z Entering 'third_party/ideep/mkl-dnn/third_party/oneDNN' 2022-05-18T04:15:18.4031259Z Entering 'third_party/ios-cmake' 2022-05-18T04:15:18.4071248Z Entering 'third_party/kineto' 2022-05-18T04:15:18.4111749Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2022-05-18T04:15:18.4151808Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2022-05-18T04:15:18.4193721Z Entering 'third_party/nccl/nccl' 2022-05-18T04:15:18.4233873Z Entering 'third_party/neon2sse' 2022-05-18T04:15:18.4274533Z Entering 'third_party/onnx' 2022-05-18T04:15:18.4326431Z Entering 'third_party/onnx/third_party/benchmark' 2022-05-18T04:15:18.4368622Z Entering 'third_party/onnx/third_party/pybind11' 2022-05-18T04:15:18.4410522Z Entering 'third_party/onnx-tensorrt' 2022-05-18T04:15:18.4450492Z Entering 'third_party/onnx-tensorrt/third_party/onnx' 2022-05-18T04:15:18.4495494Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/benchmark' 2022-05-18T04:15:18.4537648Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11' 2022-05-18T04:15:18.4578286Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11/tools/clang' 2022-05-18T04:15:18.4623156Z Entering 'third_party/pocketfft' 2022-05-18T04:15:18.4663832Z Entering 'third_party/protobuf' 2022-05-18T04:15:18.4709208Z Entering 'third_party/protobuf/third_party/benchmark' 2022-05-18T04:15:18.4749681Z Entering 'third_party/protobuf/third_party/googletest' 2022-05-18T04:15:18.4792301Z Entering 'third_party/psimd' 2022-05-18T04:15:18.4832854Z Entering 'third_party/pthreadpool' 2022-05-18T04:15:18.4873567Z Entering 'third_party/pybind11' 2022-05-18T04:15:18.4914234Z Entering 'third_party/python-enum' 2022-05-18T04:15:18.4954386Z Entering 'third_party/python-peachpy' 2022-05-18T04:15:18.4996930Z Entering 'third_party/python-six' 2022-05-18T04:15:18.5036578Z Entering 'third_party/sleef' 2022-05-18T04:15:18.5076659Z Entering 'third_party/tbb' 2022-05-18T04:15:18.5119401Z Entering 'third_party/tensorpipe' 2022-05-18T04:15:18.5159653Z Entering 'third_party/tensorpipe/third_party/googletest' 2022-05-18T04:15:18.5200368Z Entering 'third_party/tensorpipe/third_party/libnop' 2022-05-18T04:15:18.5241390Z Entering 'third_party/tensorpipe/third_party/libuv' 2022-05-18T04:15:18.5282113Z Entering 'third_party/tensorpipe/third_party/pybind11' 2022-05-18T04:15:18.5322854Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2022-05-18T04:15:18.5367862Z Entering 'third_party/zstd' 2022-05-18T04:15:18.5422125Z [command]/usr/bin/git submodule foreach --recursive git config --local 'http.https://github.com/.extraheader' 'AUTHORIZATION: basic ***' && git config --local --show-origin --name-only --get-regexp remote.origin.url 2022-05-18T04:15:18.5739712Z Entering 'android/libs/fbjni' 2022-05-18T04:15:18.5798480Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/android/libs/fbjni/config remote.origin.url 2022-05-18T04:15:18.5798980Z Entering 'third_party/FP16' 2022-05-18T04:15:18.5834605Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FP16/config remote.origin.url 2022-05-18T04:15:18.5851999Z Entering 'third_party/FXdiv' 2022-05-18T04:15:18.5889450Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/FXdiv/config remote.origin.url 2022-05-18T04:15:18.5906779Z Entering 'third_party/NNPACK' 2022-05-18T04:15:18.5945662Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK/config remote.origin.url 2022-05-18T04:15:18.5962973Z Entering 'third_party/QNNPACK' 2022-05-18T04:15:18.6002233Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/QNNPACK/config remote.origin.url 2022-05-18T04:15:18.6019514Z Entering 'third_party/XNNPACK' 2022-05-18T04:15:18.6058669Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/XNNPACK/config remote.origin.url 2022-05-18T04:15:18.6086078Z Entering 'third_party/benchmark' 2022-05-18T04:15:18.6124036Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/benchmark/config remote.origin.url 2022-05-18T04:15:18.6141552Z Entering 'third_party/cpuinfo' 2022-05-18T04:15:18.6179022Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cpuinfo/config remote.origin.url 2022-05-18T04:15:18.6196396Z Entering 'third_party/cub' 2022-05-18T04:15:18.6234020Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cub/config remote.origin.url 2022-05-18T04:15:18.6251215Z Entering 'third_party/cudnn_frontend' 2022-05-18T04:15:18.6288252Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/cudnn_frontend/config remote.origin.url 2022-05-18T04:15:18.6311469Z Entering 'third_party/eigen' 2022-05-18T04:15:18.6349328Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/eigen/config remote.origin.url 2022-05-18T04:15:18.6368392Z Entering 'third_party/fbgemm' 2022-05-18T04:15:18.6407426Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/config remote.origin.url 2022-05-18T04:15:18.6424843Z Entering 'third_party/fbgemm/third_party/asmjit' 2022-05-18T04:15:18.6461971Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/third_party/asmjit/config remote.origin.url 2022-05-18T04:15:18.6478877Z Entering 'third_party/fbgemm/third_party/cpuinfo' 2022-05-18T04:15:18.6516878Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/third_party/cpuinfo/config remote.origin.url 2022-05-18T04:15:18.6534346Z Entering 'third_party/fbgemm/third_party/googletest' 2022-05-18T04:15:18.6571791Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fbgemm/modules/third_party/googletest/config remote.origin.url 2022-05-18T04:15:18.6590519Z Entering 'third_party/flatbuffers' 2022-05-18T04:15:18.6629284Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/flatbuffers/config remote.origin.url 2022-05-18T04:15:18.6649389Z Entering 'third_party/fmt' 2022-05-18T04:15:18.6687777Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/fmt/config remote.origin.url 2022-05-18T04:15:18.6706478Z Entering 'third_party/foxi' 2022-05-18T04:15:18.6744799Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/foxi/config remote.origin.url 2022-05-18T04:15:18.6761153Z Entering 'third_party/gemmlowp/gemmlowp' 2022-05-18T04:15:18.6800057Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/gemmlowp/gemmlowp/config remote.origin.url 2022-05-18T04:15:18.6817334Z Entering 'third_party/gloo' 2022-05-18T04:15:18.6855098Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/gloo/config remote.origin.url 2022-05-18T04:15:18.6871966Z Entering 'third_party/googletest' 2022-05-18T04:15:18.6909724Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/googletest/config remote.origin.url 2022-05-18T04:15:18.6926263Z Entering 'third_party/ideep' 2022-05-18T04:15:18.6964118Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/config remote.origin.url 2022-05-18T04:15:18.6980359Z Entering 'third_party/ideep/mkl-dnn' 2022-05-18T04:15:18.7017812Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/modules/mkl-dnn/config remote.origin.url 2022-05-18T04:15:18.7036443Z Entering 'third_party/ideep/mkl-dnn/third_party/oneDNN' 2022-05-18T04:15:18.7074991Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ideep/modules/mkl-dnn/modules/third_party/oneDNN/config remote.origin.url 2022-05-18T04:15:18.7098679Z Entering 'third_party/ios-cmake' 2022-05-18T04:15:18.7137454Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/ios-cmake/config remote.origin.url 2022-05-18T04:15:18.7153774Z Entering 'third_party/kineto' 2022-05-18T04:15:18.7192438Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/config remote.origin.url 2022-05-18T04:15:18.7208527Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2022-05-18T04:15:18.7246281Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/fmt/config remote.origin.url 2022-05-18T04:15:18.7263635Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2022-05-18T04:15:18.7300694Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/kineto/modules/libkineto/third_party/googletest/config remote.origin.url 2022-05-18T04:15:18.7318502Z Entering 'third_party/nccl/nccl' 2022-05-18T04:15:18.7357618Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/nccl/nccl/config remote.origin.url 2022-05-18T04:15:18.7375134Z Entering 'third_party/neon2sse' 2022-05-18T04:15:18.7413208Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/neon2sse/config remote.origin.url 2022-05-18T04:15:18.7429391Z Entering 'third_party/onnx' 2022-05-18T04:15:18.7468236Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/config remote.origin.url 2022-05-18T04:15:18.7497236Z Entering 'third_party/onnx/third_party/benchmark' 2022-05-18T04:15:18.7535278Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/modules/third_party/benchmark/config remote.origin.url 2022-05-18T04:15:18.7552127Z Entering 'third_party/onnx/third_party/pybind11' 2022-05-18T04:15:18.7589723Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx/modules/third_party/pybind11/config remote.origin.url 2022-05-18T04:15:18.7608543Z Entering 'third_party/onnx-tensorrt' 2022-05-18T04:15:18.7645943Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx-tensorrt/config remote.origin.url 2022-05-18T04:15:18.7663055Z Entering 'third_party/onnx-tensorrt/third_party/onnx' 2022-05-18T04:15:18.7701711Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx-tensorrt/modules/third_party/onnx/config remote.origin.url 2022-05-18T04:15:18.7723369Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/benchmark' 2022-05-18T04:15:18.7762011Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx-tensorrt/modules/third_party/onnx/modules/third_party/benchmark/config remote.origin.url 2022-05-18T04:15:18.7779544Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11' 2022-05-18T04:15:18.7818180Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx-tensorrt/modules/third_party/onnx/modules/third_party/pybind11/config remote.origin.url 2022-05-18T04:15:18.7835183Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11/tools/clang' 2022-05-18T04:15:18.7873747Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/onnx-tensorrt/modules/third_party/onnx/modules/third_party/pybind11/modules/tools/clang/config remote.origin.url 2022-05-18T04:15:18.7895414Z Entering 'third_party/pocketfft' 2022-05-18T04:15:18.7933452Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/pocketfft/config remote.origin.url 2022-05-18T04:15:18.7949943Z Entering 'third_party/protobuf' 2022-05-18T04:15:18.7988383Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/config remote.origin.url 2022-05-18T04:15:18.8008642Z Entering 'third_party/protobuf/third_party/benchmark' 2022-05-18T04:15:18.8046045Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/benchmark/config remote.origin.url 2022-05-18T04:15:18.8063400Z Entering 'third_party/protobuf/third_party/googletest' 2022-05-18T04:15:18.8101636Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/protobuf/modules/third_party/googletest/config remote.origin.url 2022-05-18T04:15:18.8120579Z Entering 'third_party/psimd' 2022-05-18T04:15:18.8158484Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/psimd/config remote.origin.url 2022-05-18T04:15:18.8175551Z Entering 'third_party/pthreadpool' 2022-05-18T04:15:18.8213439Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/NNPACK_deps/pthreadpool/config remote.origin.url 2022-05-18T04:15:18.8229762Z Entering 'third_party/pybind11' 2022-05-18T04:15:18.8267708Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/pybind11/config remote.origin.url 2022-05-18T04:15:18.8284469Z Entering 'third_party/python-enum' 2022-05-18T04:15:18.8322489Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/python-enum/config remote.origin.url 2022-05-18T04:15:18.8338936Z Entering 'third_party/python-peachpy' 2022-05-18T04:15:18.8376666Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/python-peachpy/config remote.origin.url 2022-05-18T04:15:18.8393035Z Entering 'third_party/python-six' 2022-05-18T04:15:18.8432182Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/python-six/config remote.origin.url 2022-05-18T04:15:18.8448759Z Entering 'third_party/sleef' 2022-05-18T04:15:18.8486962Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/sleef/config remote.origin.url 2022-05-18T04:15:18.8504237Z Entering 'third_party/tbb' 2022-05-18T04:15:18.8541867Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tbb/config remote.origin.url 2022-05-18T04:15:18.8560695Z Entering 'third_party/tensorpipe' 2022-05-18T04:15:18.8599551Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/config remote.origin.url 2022-05-18T04:15:18.8617353Z Entering 'third_party/tensorpipe/third_party/googletest' 2022-05-18T04:15:18.8655163Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/googletest/config remote.origin.url 2022-05-18T04:15:18.8671689Z Entering 'third_party/tensorpipe/third_party/libnop' 2022-05-18T04:15:18.8710845Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libnop/config remote.origin.url 2022-05-18T04:15:18.8727372Z Entering 'third_party/tensorpipe/third_party/libuv' 2022-05-18T04:15:18.8765746Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/libuv/config remote.origin.url 2022-05-18T04:15:18.8783317Z Entering 'third_party/tensorpipe/third_party/pybind11' 2022-05-18T04:15:18.8821456Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/config remote.origin.url 2022-05-18T04:15:18.8837419Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2022-05-18T04:15:18.8876017Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/tensorpipe/modules/third_party/pybind11/modules/tools/clang/config remote.origin.url 2022-05-18T04:15:18.8896378Z Entering 'third_party/zstd' 2022-05-18T04:15:18.8934639Z file:/home/ec2-user/actions-runner/_work/pytorch/pytorch/.git/modules/third_party/zstd/config remote.origin.url 2022-05-18T04:15:18.9661179Z [command]/usr/bin/git submodule foreach --recursive git config --local --add 'url.https://github.com/.insteadOf' 'git@github.com:' 2022-05-18T04:15:18.9978830Z Entering 'android/libs/fbjni' 2022-05-18T04:15:19.0019915Z Entering 'third_party/FP16' 2022-05-18T04:15:19.0062130Z Entering 'third_party/FXdiv' 2022-05-18T04:15:19.0103228Z Entering 'third_party/NNPACK' 2022-05-18T04:15:19.0144583Z Entering 'third_party/QNNPACK' 2022-05-18T04:15:19.0187006Z Entering 'third_party/XNNPACK' 2022-05-18T04:15:19.0239705Z Entering 'third_party/benchmark' 2022-05-18T04:15:19.0281131Z Entering 'third_party/cpuinfo' 2022-05-18T04:15:19.0323253Z Entering 'third_party/cub' 2022-05-18T04:15:19.0365114Z Entering 'third_party/cudnn_frontend' 2022-05-18T04:15:19.0412071Z Entering 'third_party/eigen' 2022-05-18T04:15:19.0455599Z Entering 'third_party/fbgemm' 2022-05-18T04:15:19.0497392Z Entering 'third_party/fbgemm/third_party/asmjit' 2022-05-18T04:15:19.0538109Z Entering 'third_party/fbgemm/third_party/cpuinfo' 2022-05-18T04:15:19.0579422Z Entering 'third_party/fbgemm/third_party/googletest' 2022-05-18T04:15:19.0621939Z Entering 'third_party/flatbuffers' 2022-05-18T04:15:19.0665846Z Entering 'third_party/fmt' 2022-05-18T04:15:19.0706891Z Entering 'third_party/foxi' 2022-05-18T04:15:19.0748359Z Entering 'third_party/gemmlowp/gemmlowp' 2022-05-18T04:15:19.0790393Z Entering 'third_party/gloo' 2022-05-18T04:15:19.0832017Z Entering 'third_party/googletest' 2022-05-18T04:15:19.0873775Z Entering 'third_party/ideep' 2022-05-18T04:15:19.0914785Z Entering 'third_party/ideep/mkl-dnn' 2022-05-18T04:15:19.0958643Z Entering 'third_party/ideep/mkl-dnn/third_party/oneDNN' 2022-05-18T04:15:19.1006985Z Entering 'third_party/ios-cmake' 2022-05-18T04:15:19.1049557Z Entering 'third_party/kineto' 2022-05-18T04:15:19.1091448Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2022-05-18T04:15:19.1133351Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2022-05-18T04:15:19.1202615Z Entering 'third_party/nccl/nccl' 2022-05-18T04:15:19.1244992Z Entering 'third_party/neon2sse' 2022-05-18T04:15:19.1287220Z Entering 'third_party/onnx' 2022-05-18T04:15:19.1342925Z Entering 'third_party/onnx/third_party/benchmark' 2022-05-18T04:15:19.1384671Z Entering 'third_party/onnx/third_party/pybind11' 2022-05-18T04:15:19.1428587Z Entering 'third_party/onnx-tensorrt' 2022-05-18T04:15:19.1469721Z Entering 'third_party/onnx-tensorrt/third_party/onnx' 2022-05-18T04:15:19.1516596Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/benchmark' 2022-05-18T04:15:19.1560765Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11' 2022-05-18T04:15:19.1602642Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11/tools/clang' 2022-05-18T04:15:19.1648433Z Entering 'third_party/pocketfft' 2022-05-18T04:15:19.1690621Z Entering 'third_party/protobuf' 2022-05-18T04:15:19.1736033Z Entering 'third_party/protobuf/third_party/benchmark' 2022-05-18T04:15:19.1777530Z Entering 'third_party/protobuf/third_party/googletest' 2022-05-18T04:15:19.1821733Z Entering 'third_party/psimd' 2022-05-18T04:15:19.1863765Z Entering 'third_party/pthreadpool' 2022-05-18T04:15:19.1905211Z Entering 'third_party/pybind11' 2022-05-18T04:15:19.1947753Z Entering 'third_party/python-enum' 2022-05-18T04:15:19.1988716Z Entering 'third_party/python-peachpy' 2022-05-18T04:15:19.2030235Z Entering 'third_party/python-six' 2022-05-18T04:15:19.2072692Z Entering 'third_party/sleef' 2022-05-18T04:15:19.2114036Z Entering 'third_party/tbb' 2022-05-18T04:15:19.2157678Z Entering 'third_party/tensorpipe' 2022-05-18T04:15:19.2199706Z Entering 'third_party/tensorpipe/third_party/googletest' 2022-05-18T04:15:19.2241852Z Entering 'third_party/tensorpipe/third_party/libnop' 2022-05-18T04:15:19.2283442Z Entering 'third_party/tensorpipe/third_party/libuv' 2022-05-18T04:15:19.2324980Z Entering 'third_party/tensorpipe/third_party/pybind11' 2022-05-18T04:15:19.2365522Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2022-05-18T04:15:19.2409414Z Entering 'third_party/zstd' 2022-05-18T04:15:19.2465080Z [command]/usr/bin/git submodule foreach --recursive git config --local --add 'url.https://github.com/.insteadOf' 'org-21003710@github.com:' 2022-05-18T04:15:19.2782908Z Entering 'android/libs/fbjni' 2022-05-18T04:15:19.2824943Z Entering 'third_party/FP16' 2022-05-18T04:15:19.2866257Z Entering 'third_party/FXdiv' 2022-05-18T04:15:19.2907605Z Entering 'third_party/NNPACK' 2022-05-18T04:15:19.2950102Z Entering 'third_party/QNNPACK' 2022-05-18T04:15:19.2991888Z Entering 'third_party/XNNPACK' 2022-05-18T04:15:19.3044233Z Entering 'third_party/benchmark' 2022-05-18T04:15:19.3085901Z Entering 'third_party/cpuinfo' 2022-05-18T04:15:19.3127368Z Entering 'third_party/cub' 2022-05-18T04:15:19.3170393Z Entering 'third_party/cudnn_frontend' 2022-05-18T04:15:19.3216838Z Entering 'third_party/eigen' 2022-05-18T04:15:19.3261757Z Entering 'third_party/fbgemm' 2022-05-18T04:15:19.3302861Z Entering 'third_party/fbgemm/third_party/asmjit' 2022-05-18T04:15:19.3343609Z Entering 'third_party/fbgemm/third_party/cpuinfo' 2022-05-18T04:15:19.3385894Z Entering 'third_party/fbgemm/third_party/googletest' 2022-05-18T04:15:19.3428080Z Entering 'third_party/flatbuffers' 2022-05-18T04:15:19.3472204Z Entering 'third_party/fmt' 2022-05-18T04:15:19.3513771Z Entering 'third_party/foxi' 2022-05-18T04:15:19.3556222Z Entering 'third_party/gemmlowp/gemmlowp' 2022-05-18T04:15:19.3597506Z Entering 'third_party/gloo' 2022-05-18T04:15:19.3639167Z Entering 'third_party/googletest' 2022-05-18T04:15:19.3680402Z Entering 'third_party/ideep' 2022-05-18T04:15:19.3721335Z Entering 'third_party/ideep/mkl-dnn' 2022-05-18T04:15:19.3763982Z Entering 'third_party/ideep/mkl-dnn/third_party/oneDNN' 2022-05-18T04:15:19.3812468Z Entering 'third_party/ios-cmake' 2022-05-18T04:15:19.3853616Z Entering 'third_party/kineto' 2022-05-18T04:15:19.3895601Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2022-05-18T04:15:19.3937463Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2022-05-18T04:15:19.3980284Z Entering 'third_party/nccl/nccl' 2022-05-18T04:15:19.4022797Z Entering 'third_party/neon2sse' 2022-05-18T04:15:19.4063752Z Entering 'third_party/onnx' 2022-05-18T04:15:19.4116010Z Entering 'third_party/onnx/third_party/benchmark' 2022-05-18T04:15:19.4157950Z Entering 'third_party/onnx/third_party/pybind11' 2022-05-18T04:15:19.4201510Z Entering 'third_party/onnx-tensorrt' 2022-05-18T04:15:19.4242312Z Entering 'third_party/onnx-tensorrt/third_party/onnx' 2022-05-18T04:15:19.4289297Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/benchmark' 2022-05-18T04:15:19.4330508Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11' 2022-05-18T04:15:19.4372706Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11/tools/clang' 2022-05-18T04:15:19.4419706Z Entering 'third_party/pocketfft' 2022-05-18T04:15:19.4461432Z Entering 'third_party/protobuf' 2022-05-18T04:15:19.4507427Z Entering 'third_party/protobuf/third_party/benchmark' 2022-05-18T04:15:19.4549388Z Entering 'third_party/protobuf/third_party/googletest' 2022-05-18T04:15:19.4592500Z Entering 'third_party/psimd' 2022-05-18T04:15:19.4634026Z Entering 'third_party/pthreadpool' 2022-05-18T04:15:19.4675941Z Entering 'third_party/pybind11' 2022-05-18T04:15:19.4718858Z Entering 'third_party/python-enum' 2022-05-18T04:15:19.4760944Z Entering 'third_party/python-peachpy' 2022-05-18T04:15:19.4802308Z Entering 'third_party/python-six' 2022-05-18T04:15:19.4844248Z Entering 'third_party/sleef' 2022-05-18T04:15:19.4885900Z Entering 'third_party/tbb' 2022-05-18T04:15:19.4929992Z Entering 'third_party/tensorpipe' 2022-05-18T04:15:19.4973520Z Entering 'third_party/tensorpipe/third_party/googletest' 2022-05-18T04:15:19.5015464Z Entering 'third_party/tensorpipe/third_party/libnop' 2022-05-18T04:15:19.5056368Z Entering 'third_party/tensorpipe/third_party/libuv' 2022-05-18T04:15:19.5098471Z Entering 'third_party/tensorpipe/third_party/pybind11' 2022-05-18T04:15:19.5139395Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2022-05-18T04:15:19.5184329Z Entering 'third_party/zstd' 2022-05-18T04:15:19.5234525Z ##[endgroup] 2022-05-18T04:15:19.5279323Z [command]/usr/bin/git log -1 --format='%H' 2022-05-18T04:15:19.5307530Z '3b2375291aab7b48442f2e6fb1ef66cebc761e24' 2022-05-18T04:15:19.5457424Z Prepare all required actions 2022-05-18T04:15:19.5487738Z ##[group]Run ./.github/actions/setup-linux 2022-05-18T04:15:19.5488026Z env: 2022-05-18T04:15:19.5488234Z IN_CI: 1 2022-05-18T04:15:19.5488461Z IS_GHA: 1 2022-05-18T04:15:19.5488713Z GIT_DEFAULT_BRANCH: master 2022-05-18T04:15:19.5488958Z ##[endgroup] 2022-05-18T04:15:19.5507445Z ##[group]Run set -euo pipefail 2022-05-18T04:15:19.5507768Z set -euo pipefail 2022-05-18T04:15:19.5508045Z function get_ec2_metadata() { 2022-05-18T04:15:19.5508389Z  # Pulled from instance metadata endpoint for EC2 2022-05-18T04:15:19.5508866Z  # see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html 2022-05-18T04:15:19.5509278Z  category=$1 2022-05-18T04:15:19.5509595Z  curl -fsSL "http://169.254.169.254/latest/meta-data/${category}" 2022-05-18T04:15:19.5509903Z } 2022-05-18T04:15:19.5510214Z echo "ami-id: $(get_ec2_metadata ami-id)" 2022-05-18T04:15:19.5510553Z echo "instance-id: $(get_ec2_metadata instance-id)" 2022-05-18T04:15:19.5510932Z echo "instance-type: $(get_ec2_metadata instance-type)" 2022-05-18T04:15:19.5511277Z echo "system info $(uname -a)" 2022-05-18T04:15:19.5524670Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2022-05-18T04:15:19.5524971Z env: 2022-05-18T04:15:19.5525195Z IN_CI: 1 2022-05-18T04:15:19.5525401Z IS_GHA: 1 2022-05-18T04:15:19.5525652Z GIT_DEFAULT_BRANCH: master 2022-05-18T04:15:19.5525912Z ##[endgroup] 2022-05-18T04:15:19.5624275Z ami-id: ami-096198a0bccc6bad4 2022-05-18T04:15:19.5686125Z instance-id: i-0f05d6101f258be9b 2022-05-18T04:15:19.5749992Z instance-type: g3.8xlarge 2022-05-18T04:15:19.5758096Z system info Linux ip-10-0-3-31.ec2.internal 4.14.252-195.483.amzn2.x86_64 #1 SMP Mon Nov 1 20:58:46 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux 2022-05-18T04:15:19.5776257Z ##[group]Run if systemctl is-active --quiet docker; then 2022-05-18T04:15:19.5776650Z if systemctl is-active --quiet docker; then 2022-05-18T04:15:19.5776991Z  echo "Docker daemon is running..."; 2022-05-18T04:15:19.5777272Z else 2022-05-18T04:15:19.5777578Z  echo "Starting docker deamon..." && sudo systemctl start docker; 2022-05-18T04:15:19.5777891Z fi 2022-05-18T04:15:19.5789949Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2022-05-18T04:15:19.5790233Z env: 2022-05-18T04:15:19.5790456Z IN_CI: 1 2022-05-18T04:15:19.5790684Z IS_GHA: 1 2022-05-18T04:15:19.5790920Z GIT_DEFAULT_BRANCH: master 2022-05-18T04:15:19.5791182Z ##[endgroup] 2022-05-18T04:15:19.5840642Z Docker daemon is running... 2022-05-18T04:15:19.5858359Z ##[group]Run AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") 2022-05-18T04:15:19.5858843Z AWS_ACCOUNT_ID=$(aws sts get-caller-identity|grep Account|cut -f4 -d\") 2022-05-18T04:15:19.5859222Z retry () { "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") } 2022-05-18T04:15:19.5859721Z retry aws ecr get-login*** "$AWS_DEFAULT_REGION" | docker login --username AWS \ 2022-05-18T04:15:19.5860193Z  --password-stdin "$AWS_ACCOUNT_ID.dkr.ecr.$AWS_DEFAULT_REGION.amazonaws.com" 2022-05-18T04:15:19.5871821Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2022-05-18T04:15:19.5872111Z env: 2022-05-18T04:15:19.5872335Z IN_CI: 1 2022-05-18T04:15:19.5872568Z IS_GHA: 1 2022-05-18T04:15:19.5872806Z GIT_DEFAULT_BRANCH: master 2022-05-18T04:15:19.5873086Z AWS_RETRY_MODE: standard 2022-05-18T04:15:19.5873352Z AWS_MAX_ATTEMPTS: 5 2022-05-18T04:15:19.5873613Z AWS_DEFAULT_REGION: us-east-1 2022-05-18T04:15:19.5873885Z ##[endgroup] 2022-05-18T04:15:20.5604898Z WARNING! Your password will be stored unencrypted in /home/ec2-user/.docker/config.json. 2022-05-18T04:15:20.5605762Z Configure a credential helper to remove this warning. See 2022-05-18T04:15:20.5606937Z https://docs.docker.com/engine/reference/commandline/login/#credentials-store 2022-05-18T04:15:20.5607424Z 2022-05-18T04:15:20.5607629Z Login Succeeded 2022-05-18T04:15:20.5645901Z ##[group]Run env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" 2022-05-18T04:15:20.5646341Z env | grep '^GITHUB' > "/tmp/github_env_${GITHUB_RUN_ID}" 2022-05-18T04:15:20.5660835Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2022-05-18T04:15:20.5661165Z env: 2022-05-18T04:15:20.5661387Z IN_CI: 1 2022-05-18T04:15:20.5661596Z IS_GHA: 1 2022-05-18T04:15:20.5661848Z GIT_DEFAULT_BRANCH: master 2022-05-18T04:15:20.5662114Z ##[endgroup] 2022-05-18T04:15:20.5723005Z Prepare all required actions 2022-05-18T04:15:20.5723347Z Getting action download info 2022-05-18T04:15:20.7283182Z Download action repository 'seemethere/add-github-ssh-key@v1' (SHA:1ecffedb1e192a50aa67dba2f0e048e5d3bfa144) 2022-05-18T04:15:20.8465195Z ##[group]Run ./.github/actions/setup-ssh 2022-05-18T04:15:20.8465460Z with: 2022-05-18T04:15:20.8465893Z github-secret: *** 2022-05-18T04:15:20.8466151Z env: 2022-05-18T04:15:20.8466374Z IN_CI: 1 2022-05-18T04:15:20.8466584Z IS_GHA: 1 2022-05-18T04:15:20.8466842Z GIT_DEFAULT_BRANCH: master 2022-05-18T04:15:20.8467109Z ##[endgroup] 2022-05-18T04:15:20.8492500Z ##[group]Run seemethere/add-github-ssh-key@v1 2022-05-18T04:15:20.8492800Z with: 2022-05-18T04:15:20.8493194Z GITHUB_TOKEN: *** 2022-05-18T04:15:20.8493457Z activate-with-label: false 2022-05-18T04:15:20.8493738Z label: with-ssh 2022-05-18T04:15:20.8494014Z remove-existing-keys: true 2022-05-18T04:15:20.8494256Z env: 2022-05-18T04:15:20.8494480Z IN_CI: 1 2022-05-18T04:15:20.8494750Z IS_GHA: 1 2022-05-18T04:15:20.8494990Z GIT_DEFAULT_BRANCH: master 2022-05-18T04:15:20.8495265Z ##[endgroup] 2022-05-18T04:15:20.9207997Z Not on pull request and ciflow reference could not be extracted, skipping adding ssh keys 2022-05-18T04:15:20.9257596Z Prepare all required actions 2022-05-18T04:15:20.9283893Z ##[group]Run ./.github/actions/pull-docker-image 2022-05-18T04:15:20.9284420Z with: 2022-05-18T04:15:20.9285328Z docker-image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7:6deab82db6a72ca54cd3e3322ee4f13864536734 2022-05-18T04:15:20.9286215Z env: 2022-05-18T04:15:20.9286627Z IN_CI: 1 2022-05-18T04:15:20.9287027Z IS_GHA: 1 2022-05-18T04:15:20.9287487Z GIT_DEFAULT_BRANCH: master 2022-05-18T04:15:20.9287961Z ##[endgroup] 2022-05-18T04:15:20.9314370Z ##[group]Run retry () { "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") } 2022-05-18T04:15:20.9315063Z retry () { "$@" || (sleep 1 && "$@") || (sleep 2 && "$@") } 2022-05-18T04:15:20.9315689Z retry docker pull "${DOCKER_IMAGE}" 2022-05-18T04:15:20.9336197Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2022-05-18T04:15:20.9336794Z env: 2022-05-18T04:15:20.9337184Z IN_CI: 1 2022-05-18T04:15:20.9337664Z IS_GHA: 1 2022-05-18T04:15:20.9338123Z GIT_DEFAULT_BRANCH: master 2022-05-18T04:15:20.9339131Z DOCKER_IMAGE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7:6deab82db6a72ca54cd3e3322ee4f13864536734 2022-05-18T04:15:20.9340037Z ##[endgroup] 2022-05-18T04:15:21.1987368Z 6deab82db6a72ca54cd3e3322ee4f13864536734: Pulling from pytorch/pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7 2022-05-18T04:15:21.1987924Z 58690f9b18fc: Pulling fs layer 2022-05-18T04:15:21.1988217Z b51569e7c507: Pulling fs layer 2022-05-18T04:15:21.1988647Z da8ef40b9eca: Pulling fs layer 2022-05-18T04:15:21.1989138Z fb15d46c38dc: Pulling fs layer 2022-05-18T04:15:21.1989651Z e0d2c5aceba3: Pulling fs layer 2022-05-18T04:15:21.1989988Z 9c4425f4b8cb: Pulling fs layer 2022-05-18T04:15:21.1990271Z 3c5d24e8ef06: Pulling fs layer 2022-05-18T04:15:21.1990572Z 79c5859701ff: Pulling fs layer 2022-05-18T04:15:21.1990828Z d81828418d08: Pulling fs layer 2022-05-18T04:15:21.1991150Z f256bb6f705c: Pulling fs layer 2022-05-18T04:15:21.1991635Z cc2d8a95a2e5: Pulling fs layer 2022-05-18T04:15:21.1992124Z b9db730d0400: Pulling fs layer 2022-05-18T04:15:21.1992714Z 49f9027dc2e7: Pulling fs layer 2022-05-18T04:15:21.1993229Z d60308a752bd: Pulling fs layer 2022-05-18T04:15:21.1993699Z 9c4425f4b8cb: Waiting 2022-05-18T04:15:21.1994195Z 624ec6d4936f: Pulling fs layer 2022-05-18T04:15:21.1994701Z b815c3dfeb5c: Pulling fs layer 2022-05-18T04:15:21.1995025Z 3c5d24e8ef06: Waiting 2022-05-18T04:15:21.1995288Z 4a9f9b66af25: Pulling fs layer 2022-05-18T04:15:21.1995575Z b6d963fbdb11: Pulling fs layer 2022-05-18T04:15:21.1995854Z 4486d2823377: Pulling fs layer 2022-05-18T04:15:21.1996095Z 79c5859701ff: Waiting 2022-05-18T04:15:21.1996361Z 68d34a18a767: Pulling fs layer 2022-05-18T04:15:21.1996643Z 1c478b5d7dcd: Pulling fs layer 2022-05-18T04:15:21.1996909Z 1d14eefa2afe: Pulling fs layer 2022-05-18T04:15:21.1997202Z fb15d46c38dc: Waiting 2022-05-18T04:15:21.1997461Z cc2d8a95a2e5: Waiting 2022-05-18T04:15:21.1997721Z cd1fd540bef8: Pulling fs layer 2022-05-18T04:15:21.1997991Z d81828418d08: Waiting 2022-05-18T04:15:21.1999149Z fdc2f33cd3f0: Pulling fs layer 2022-05-18T04:15:21.1999418Z 0626725f1e19: Pulling fs layer 2022-05-18T04:15:21.1999679Z f256bb6f705c: Waiting 2022-05-18T04:15:21.1999931Z b9db730d0400: Waiting 2022-05-18T04:15:21.2000164Z b815c3dfeb5c: Waiting 2022-05-18T04:15:21.2000422Z e0d2c5aceba3: Waiting 2022-05-18T04:15:21.2000700Z 60b6e4baae49: Pulling fs layer 2022-05-18T04:15:21.2000964Z a9f25937ad89: Pulling fs layer 2022-05-18T04:15:21.2001241Z 341c51541e6b: Pulling fs layer 2022-05-18T04:15:21.2001550Z 68d34a18a767: Waiting 2022-05-18T04:15:21.2001936Z cd1fd540bef8: Waiting 2022-05-18T04:15:21.2002182Z 60b6e4baae49: Waiting 2022-05-18T04:15:21.2002432Z 49f9027dc2e7: Waiting 2022-05-18T04:15:21.2002677Z d60308a752bd: Waiting 2022-05-18T04:15:21.2002911Z 4a9f9b66af25: Waiting 2022-05-18T04:15:21.2003157Z 0626725f1e19: Waiting 2022-05-18T04:15:21.2003401Z 624ec6d4936f: Waiting 2022-05-18T04:15:21.2003823Z 1d14eefa2afe: Waiting 2022-05-18T04:15:21.2004136Z 31a8b7b678c7: Pulling fs layer 2022-05-18T04:15:21.2004416Z ad8c1f2236e5: Pulling fs layer 2022-05-18T04:15:21.2004660Z a9f25937ad89: Waiting 2022-05-18T04:15:21.2004923Z f8f22be640a6: Pulling fs layer 2022-05-18T04:15:21.2005184Z 31a8b7b678c7: Waiting 2022-05-18T04:15:21.2005426Z 0b6a6636bca7: Pulling fs layer 2022-05-18T04:15:21.2005701Z 9c50e79f8e38: Pulling fs layer 2022-05-18T04:15:21.2005966Z ad8c1f2236e5: Waiting 2022-05-18T04:15:21.2006212Z 37c76e461e24: Pulling fs layer 2022-05-18T04:15:21.2006473Z 0b6a6636bca7: Waiting 2022-05-18T04:15:21.2006722Z 1c478b5d7dcd: Waiting 2022-05-18T04:15:21.2006948Z 9c50e79f8e38: Waiting 2022-05-18T04:15:21.2007211Z 84c1af12bf7e: Pulling fs layer 2022-05-18T04:15:21.2007490Z 30d627f75fb9: Pulling fs layer 2022-05-18T04:15:21.2007752Z 0d7a717fbbe1: Pulling fs layer 2022-05-18T04:15:21.2008151Z 84c1af12bf7e: Waiting 2022-05-18T04:15:21.2008418Z 4c42c8b107a9: Pulling fs layer 2022-05-18T04:15:21.2008666Z 30d627f75fb9: Waiting 2022-05-18T04:15:21.2008931Z dcb77576adf6: Pulling fs layer 2022-05-18T04:15:21.2009583Z 547da1897fee: Pulling fs layer 2022-05-18T04:15:21.2009842Z 0d7a717fbbe1: Waiting 2022-05-18T04:15:21.2010110Z e31574ad02fc: Pulling fs layer 2022-05-18T04:15:21.2011592Z a9dad096f89d: Pulling fs layer 2022-05-18T04:15:21.2012001Z 4c42c8b107a9: Waiting 2022-05-18T04:15:21.2012255Z dcb77576adf6: Waiting 2022-05-18T04:15:21.2012505Z e31574ad02fc: Waiting 2022-05-18T04:15:21.2012750Z 2c6e0c416cd7: Pulling fs layer 2022-05-18T04:15:21.2013032Z 9642ca476af3: Pulling fs layer 2022-05-18T04:15:21.2013316Z 23d19ef2f74c: Pulling fs layer 2022-05-18T04:15:21.2013574Z 097f8fc9708d: Pulling fs layer 2022-05-18T04:15:21.2013857Z 905dc4a4a899: Pulling fs layer 2022-05-18T04:15:21.2014122Z 2c6e0c416cd7: Waiting 2022-05-18T04:15:21.2014354Z 9642ca476af3: Waiting 2022-05-18T04:15:21.2014628Z d38486135f29: Pulling fs layer 2022-05-18T04:15:21.2014892Z 097f8fc9708d: Waiting 2022-05-18T04:15:21.2015160Z db1065d40131: Pulling fs layer 2022-05-18T04:15:21.2015405Z 905dc4a4a899: Waiting 2022-05-18T04:15:21.2015672Z 22de453da86e: Pulling fs layer 2022-05-18T04:15:21.2015932Z d38486135f29: Waiting 2022-05-18T04:15:21.2016171Z dc725e8f0593: Pulling fs layer 2022-05-18T04:15:21.2016450Z a0bccb87b633: Pulling fs layer 2022-05-18T04:15:21.2016720Z 23d19ef2f74c: Waiting 2022-05-18T04:15:21.2016952Z dc725e8f0593: Waiting 2022-05-18T04:15:21.2730994Z b51569e7c507: Verifying Checksum 2022-05-18T04:15:21.2731326Z b51569e7c507: Download complete 2022-05-18T04:15:21.3526539Z fb15d46c38dc: Download complete 2022-05-18T04:15:21.4903620Z da8ef40b9eca: Verifying Checksum 2022-05-18T04:15:21.4903991Z da8ef40b9eca: Download complete 2022-05-18T04:15:21.4922327Z e0d2c5aceba3: Verifying Checksum 2022-05-18T04:15:21.4922947Z e0d2c5aceba3: Download complete 2022-05-18T04:15:21.5710450Z 3c5d24e8ef06: Verifying Checksum 2022-05-18T04:15:21.5710799Z 3c5d24e8ef06: Download complete 2022-05-18T04:15:21.6441267Z 79c5859701ff: Verifying Checksum 2022-05-18T04:15:21.6441597Z 79c5859701ff: Download complete 2022-05-18T04:15:21.6786418Z 9c4425f4b8cb: Verifying Checksum 2022-05-18T04:15:21.6786756Z 9c4425f4b8cb: Download complete 2022-05-18T04:15:21.7308244Z 58690f9b18fc: Verifying Checksum 2022-05-18T04:15:21.7308566Z 58690f9b18fc: Download complete 2022-05-18T04:15:21.7644184Z f256bb6f705c: Verifying Checksum 2022-05-18T04:15:21.7644672Z f256bb6f705c: Download complete 2022-05-18T04:15:21.8577288Z b9db730d0400: Verifying Checksum 2022-05-18T04:15:21.8577631Z b9db730d0400: Download complete 2022-05-18T04:15:21.9421249Z 49f9027dc2e7: Verifying Checksum 2022-05-18T04:15:21.9421901Z 49f9027dc2e7: Download complete 2022-05-18T04:15:23.1143648Z 58690f9b18fc: Pull complete 2022-05-18T04:15:23.2495605Z b51569e7c507: Pull complete 2022-05-18T04:15:23.3657245Z da8ef40b9eca: Pull complete 2022-05-18T04:15:23.4904955Z fb15d46c38dc: Pull complete 2022-05-18T04:15:23.6928325Z d60308a752bd: Verifying Checksum 2022-05-18T04:15:23.6928672Z d60308a752bd: Download complete 2022-05-18T04:15:23.7718942Z 624ec6d4936f: Verifying Checksum 2022-05-18T04:15:23.7719324Z 624ec6d4936f: Download complete 2022-05-18T04:15:23.7750545Z e0d2c5aceba3: Pull complete 2022-05-18T04:15:23.8548420Z b815c3dfeb5c: Verifying Checksum 2022-05-18T04:15:23.8548789Z b815c3dfeb5c: Download complete 2022-05-18T04:15:23.9226131Z 4a9f9b66af25: Verifying Checksum 2022-05-18T04:15:23.9226759Z 4a9f9b66af25: Download complete 2022-05-18T04:15:24.1196974Z 9c4425f4b8cb: Pull complete 2022-05-18T04:15:24.2506194Z 3c5d24e8ef06: Pull complete 2022-05-18T04:15:24.3245750Z b6d963fbdb11: Verifying Checksum 2022-05-18T04:15:24.3246146Z b6d963fbdb11: Download complete 2022-05-18T04:15:24.3818755Z 79c5859701ff: Pull complete 2022-05-18T04:15:24.4230802Z 4486d2823377: Verifying Checksum 2022-05-18T04:15:24.4231113Z 4486d2823377: Download complete 2022-05-18T04:15:24.5024585Z 68d34a18a767: Verifying Checksum 2022-05-18T04:15:24.5025074Z 68d34a18a767: Download complete 2022-05-18T04:15:30.3765457Z d81828418d08: Verifying Checksum 2022-05-18T04:15:30.3765816Z d81828418d08: Download complete 2022-05-18T04:15:30.4645014Z 1d14eefa2afe: Download complete 2022-05-18T04:15:30.5588744Z cd1fd540bef8: Verifying Checksum 2022-05-18T04:15:30.5589103Z cd1fd540bef8: Download complete 2022-05-18T04:15:31.1047317Z fdc2f33cd3f0: Verifying Checksum 2022-05-18T04:15:31.1047763Z fdc2f33cd3f0: Download complete 2022-05-18T04:15:31.2066406Z 0626725f1e19: Verifying Checksum 2022-05-18T04:15:31.2067020Z 0626725f1e19: Download complete 2022-05-18T04:15:31.3525329Z a9f25937ad89: Verifying Checksum 2022-05-18T04:15:31.3526035Z a9f25937ad89: Download complete 2022-05-18T04:15:32.3114956Z 341c51541e6b: Verifying Checksum 2022-05-18T04:15:32.3115364Z 341c51541e6b: Download complete 2022-05-18T04:15:32.3589942Z cc2d8a95a2e5: Verifying Checksum 2022-05-18T04:15:32.3590292Z cc2d8a95a2e5: Download complete 2022-05-18T04:15:32.3890226Z 31a8b7b678c7: Download complete 2022-05-18T04:15:32.4421782Z ad8c1f2236e5: Verifying Checksum 2022-05-18T04:15:32.4935659Z f8f22be640a6: Verifying Checksum 2022-05-18T04:15:32.4936243Z f8f22be640a6: Download complete 2022-05-18T04:15:32.5303932Z 0b6a6636bca7: Verifying Checksum 2022-05-18T04:15:32.5304515Z 0b6a6636bca7: Download complete 2022-05-18T04:15:32.5796429Z 9c50e79f8e38: Verifying Checksum 2022-05-18T04:15:32.5796761Z 9c50e79f8e38: Download complete 2022-05-18T04:15:32.6679073Z 84c1af12bf7e: Verifying Checksum 2022-05-18T04:15:32.6679610Z 84c1af12bf7e: Download complete 2022-05-18T04:15:32.7479850Z 30d627f75fb9: Download complete 2022-05-18T04:15:33.0168758Z 0d7a717fbbe1: Verifying Checksum 2022-05-18T04:15:33.0169423Z 0d7a717fbbe1: Download complete 2022-05-18T04:15:33.1048866Z 4c42c8b107a9: Download complete 2022-05-18T04:15:33.5195366Z dcb77576adf6: Verifying Checksum 2022-05-18T04:15:33.5195954Z dcb77576adf6: Download complete 2022-05-18T04:15:33.5661850Z 37c76e461e24: Verifying Checksum 2022-05-18T04:15:33.5662221Z 37c76e461e24: Download complete 2022-05-18T04:15:33.6146395Z 547da1897fee: Verifying Checksum 2022-05-18T04:15:33.6146780Z 547da1897fee: Download complete 2022-05-18T04:15:33.6516561Z e31574ad02fc: Verifying Checksum 2022-05-18T04:15:33.6516895Z e31574ad02fc: Download complete 2022-05-18T04:15:33.7426585Z 2c6e0c416cd7: Verifying Checksum 2022-05-18T04:15:33.7427010Z 2c6e0c416cd7: Download complete 2022-05-18T04:15:33.8354091Z 9642ca476af3: Verifying Checksum 2022-05-18T04:15:33.8354680Z 9642ca476af3: Download complete 2022-05-18T04:15:33.9226346Z 23d19ef2f74c: Download complete 2022-05-18T04:15:34.0148714Z 097f8fc9708d: Verifying Checksum 2022-05-18T04:15:34.2008578Z 905dc4a4a899: Verifying Checksum 2022-05-18T04:15:34.2008959Z 905dc4a4a899: Download complete 2022-05-18T04:15:34.2888599Z d38486135f29: Verifying Checksum 2022-05-18T04:15:34.2888950Z d38486135f29: Download complete 2022-05-18T04:15:34.8972098Z db1065d40131: Verifying Checksum 2022-05-18T04:15:34.8972724Z db1065d40131: Download complete 2022-05-18T04:15:34.9854230Z 22de453da86e: Verifying Checksum 2022-05-18T04:15:34.9854548Z 22de453da86e: Download complete 2022-05-18T04:15:37.2644832Z a9dad096f89d: Verifying Checksum 2022-05-18T04:15:37.2645208Z a9dad096f89d: Download complete 2022-05-18T04:15:37.3397663Z a0bccb87b633: Verifying Checksum 2022-05-18T04:15:37.3398457Z a0bccb87b633: Download complete 2022-05-18T04:15:38.5822182Z 1c478b5d7dcd: Verifying Checksum 2022-05-18T04:15:38.5822541Z 1c478b5d7dcd: Download complete 2022-05-18T04:15:41.3714675Z d81828418d08: Pull complete 2022-05-18T04:15:41.4787696Z f256bb6f705c: Pull complete 2022-05-18T04:15:57.8480929Z cc2d8a95a2e5: Pull complete 2022-05-18T04:15:59.4935475Z b9db730d0400: Pull complete 2022-05-18T04:16:01.4257835Z 49f9027dc2e7: Pull complete 2022-05-18T04:16:04.9842871Z dc725e8f0593: Download complete 2022-05-18T04:16:08.2474187Z d60308a752bd: Pull complete 2022-05-18T04:16:10.1258032Z 624ec6d4936f: Pull complete 2022-05-18T04:16:12.0038839Z b815c3dfeb5c: Pull complete 2022-05-18T04:16:13.8506688Z 4a9f9b66af25: Pull complete 2022-05-18T04:16:17.4948652Z b6d963fbdb11: Pull complete 2022-05-18T04:16:19.6276257Z 4486d2823377: Pull complete 2022-05-18T04:16:21.5068204Z 68d34a18a767: Pull complete 2022-05-18T04:16:46.3665083Z 1c478b5d7dcd: Pull complete 2022-05-18T04:16:48.6029915Z 1d14eefa2afe: Pull complete 2022-05-18T04:16:50.4797879Z cd1fd540bef8: Pull complete 2022-05-18T04:16:53.5946553Z fdc2f33cd3f0: Pull complete 2022-05-18T04:16:55.6229194Z 0626725f1e19: Pull complete 2022-05-18T04:16:57.5314684Z 60b6e4baae49: Pull complete 2022-05-18T04:16:59.4032183Z a9f25937ad89: Pull complete 2022-05-18T04:17:03.4269169Z 341c51541e6b: Pull complete 2022-05-18T04:17:05.8190281Z 31a8b7b678c7: Pull complete 2022-05-18T04:17:08.6488155Z ad8c1f2236e5: Pull complete 2022-05-18T04:17:08.7934941Z f8f22be640a6: Pull complete 2022-05-18T04:17:08.8981748Z 0b6a6636bca7: Pull complete 2022-05-18T04:17:09.0227481Z 9c50e79f8e38: Pull complete 2022-05-18T04:17:11.4275986Z 37c76e461e24: Pull complete 2022-05-18T04:17:11.5381766Z 84c1af12bf7e: Pull complete 2022-05-18T04:17:11.6513143Z 30d627f75fb9: Pull complete 2022-05-18T04:17:12.0038580Z 0d7a717fbbe1: Pull complete 2022-05-18T04:17:12.1060093Z 4c42c8b107a9: Pull complete 2022-05-18T04:17:13.2862355Z dcb77576adf6: Pull complete 2022-05-18T04:17:13.4018056Z 547da1897fee: Pull complete 2022-05-18T04:17:13.5191706Z e31574ad02fc: Pull complete 2022-05-18T04:17:20.9158503Z a9dad096f89d: Pull complete 2022-05-18T04:17:23.3402309Z 2c6e0c416cd7: Pull complete 2022-05-18T04:17:25.6775614Z 9642ca476af3: Pull complete 2022-05-18T04:17:27.5560546Z 23d19ef2f74c: Pull complete 2022-05-18T04:17:29.4018304Z 097f8fc9708d: Pull complete 2022-05-18T04:17:32.0095105Z 905dc4a4a899: Pull complete 2022-05-18T04:17:33.7536690Z d38486135f29: Pull complete 2022-05-18T04:17:37.3916313Z db1065d40131: Pull complete 2022-05-18T04:17:37.4969622Z 22de453da86e: Pull complete 2022-05-18T04:18:18.6233740Z dc725e8f0593: Pull complete 2022-05-18T04:18:20.5341472Z a0bccb87b633: Pull complete 2022-05-18T04:18:21.8818666Z Digest: sha256:66b56fbc2d0d8bf75af01c4976aba15f28c9802507dc01f27e71a55f8ffc13e0 2022-05-18T04:18:22.3826019Z Status: Downloaded newer image for 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7:6deab82db6a72ca54cd3e3322ee4f13864536734 2022-05-18T04:18:22.6649834Z 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7:6deab82db6a72ca54cd3e3322ee4f13864536734 2022-05-18T04:18:22.6783871Z ##[group]Run nick-fields/retry@71062288b76e2b6214ebde0e673ce0de1755740a 2022-05-18T04:18:22.6784258Z with: 2022-05-18T04:18:22.6784515Z timeout_minutes: 10 2022-05-18T04:18:22.6784788Z max_attempts: 3 2022-05-18T04:18:22.6785173Z command: set -ex bash .github/scripts/install_nvidia_utils_linux.sh echo "GPU_FLAG=--gpus all" >> "${GITHUB_ENV}" 2022-05-18T04:18:22.6785581Z retry_wait_seconds: 10 2022-05-18T04:18:22.6785879Z polling_interval_seconds: 1 2022-05-18T04:18:22.6786174Z warning_on_retry: true 2022-05-18T04:18:22.6786445Z continue_on_error: false 2022-05-18T04:18:22.6786709Z env: 2022-05-18T04:18:22.6786945Z IN_CI: 1 2022-05-18T04:18:22.6787162Z IS_GHA: 1 2022-05-18T04:18:22.6787434Z GIT_DEFAULT_BRANCH: master 2022-05-18T04:18:22.6787714Z ##[endgroup] 2022-05-18T04:18:22.7284031Z 2022-05-18T04:18:22.7364977Z == Installing nvidia container toolkit for amzn2 == 2022-05-18T04:18:22.7368309Z + bash .github/scripts/install_nvidia_utils_linux.sh 2022-05-18T04:18:22.7368759Z + sudo yum install -y yum-utils 2022-05-18T04:18:23.2859752Z Loaded plugins: extras_suggestions, langpacks, priorities, update-motd 2022-05-18T04:18:24.6221426Z Package yum-utils-1.1.31-46.amzn2.0.1.noarch already installed and latest version 2022-05-18T04:18:24.6222135Z Nothing to do 2022-05-18T04:18:24.6961434Z + sudo yum-config-manager --add-repo https://nvidia.github.io/nvidia-docker/amzn2/nvidia-docker.repo 2022-05-18T04:18:25.2826848Z Loaded plugins: extras_suggestions, langpacks, priorities, update-motd 2022-05-18T04:18:25.3196168Z adding repo from: https://nvidia.github.io/nvidia-docker/amzn2/nvidia-docker.repo 2022-05-18T04:18:25.3196827Z grabbing file https://nvidia.github.io/nvidia-docker/amzn2/nvidia-docker.repo to /etc/yum.repos.d/nvidia-docker.repo 2022-05-18T04:18:25.3197361Z repo saved to /etc/yum.repos.d/nvidia-docker.repo 2022-05-18T04:18:25.3341092Z + sudo yum install -y nvidia-docker2 2022-05-18T04:18:25.8742181Z Loaded plugins: extras_suggestions, langpacks, priorities, update-motd 2022-05-18T04:18:27.1604466Z Package nvidia-docker2-2.10.0-1.noarch already installed and latest version 2022-05-18T04:18:27.1605204Z Nothing to do 2022-05-18T04:18:27.2400915Z + sudo systemctl restart docker 2022-05-18T04:18:53.7039060Z == Installing nvidia driver NVIDIA-Linux-x86_64-510.60.02.run == 2022-05-18T04:18:53.7040204Z + sudo yum groupinstall -y 'Development Tools' 2022-05-18T04:18:54.2719401Z Loaded plugins: extras_suggestions, langpacks, priorities, update-motd 2022-05-18T04:18:55.3100524Z Maybe run: yum groups mark install (see man yum) 2022-05-18T04:18:55.3101514Z No packages in any requested group available to install or update 2022-05-18T04:18:55.3817194Z ++ uname -r 2022-05-18T04:18:55.3822266Z + sudo yum install -y 'kernel-devel-uname-r == 4.14.252-195.483.amzn2.x86_64' 2022-05-18T04:18:55.9509715Z Loaded plugins: extras_suggestions, langpacks, priorities, update-motd 2022-05-18T04:18:57.2257322Z Package kernel-devel-4.14.252-195.483.amzn2.x86_64 already installed and latest version 2022-05-18T04:18:57.2259473Z Nothing to do 2022-05-18T04:18:57.2995987Z + sudo curl -fsL -o /tmp/nvidia_driver https://s3.amazonaws.com/ossci-linux/nvidia_driver/NVIDIA-Linux-x86_64-510.60.02.run 2022-05-18T04:19:00.6430469Z + sudo /bin/bash /tmp/nvidia_driver -s --no-drm 2022-05-18T04:19:01.8982121Z Verifying archive integrity... OK 2022-05-18T04:19:26.1331091Z Uncompressing NVIDIA Accelerated Graphics Driver for Linux-x86_64 510.60.02.......................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... 2022-05-18T04:19:26.3720744Z 2022-05-18T04:19:26.3721458Z WARNING: The nvidia-drm module will not be installed. As a result, DRM-KMS will not function with this installation of the NVIDIA driver. 2022-05-18T04:19:26.3722415Z 2022-05-18T04:19:41.2893409Z 2022-05-18T04:19:41.2894851Z WARNING: nvidia-installer was forced to guess the X library path '/usr/lib64' and X module path '/usr/lib64/xorg/modules'; these paths were not queryable from the system. If X fails to find the NVIDIA X driver module, please install the `pkg-config` utility and the X.Org SDK/development package for your distribution and reinstall the driver. 2022-05-18T04:19:41.2895498Z 2022-05-18T04:19:50.1903220Z + sudo rm -fv /tmp/nvidia_driver 2022-05-18T04:19:50.2664439Z removed ‘/tmp/nvidia_driver’ 2022-05-18T04:19:50.2679899Z + nvidia-smi 2022-05-18T04:19:54.4486123Z Wed May 18 04:19:54 2022 2022-05-18T04:19:54.4486709Z +-----------------------------------------------------------------------------+ 2022-05-18T04:19:54.4491549Z | NVIDIA-SMI 510.60.02 Driver Version: 510.60.02 CUDA Version: 11.6 | 2022-05-18T04:19:54.4492482Z |-------------------------------+----------------------+----------------------+ 2022-05-18T04:19:54.4493330Z | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | 2022-05-18T04:19:54.4494234Z | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | 2022-05-18T04:19:54.4494846Z | | | MIG M. | 2022-05-18T04:19:54.4495674Z |===============================+======================+======================| 2022-05-18T04:19:54.4537290Z | 0 Tesla M60 Off | 00000000:00:1D.0 Off | 0 | 2022-05-18T04:19:54.4537661Z | N/A 29C P0 38W / 150W | 0MiB / 7680MiB | 0% Default | 2022-05-18T04:19:54.4537969Z | | | N/A | 2022-05-18T04:19:54.4538450Z +-------------------------------+----------------------+----------------------+ 2022-05-18T04:19:54.4585847Z | 1 Tesla M60 Off | 00000000:00:1E.0 Off | 0 | 2022-05-18T04:19:54.4586493Z | N/A 36C P0 37W / 150W | 0MiB / 7680MiB | 100% Default | 2022-05-18T04:19:54.4587067Z | | | N/A | 2022-05-18T04:19:54.4587970Z +-------------------------------+----------------------+----------------------+ 2022-05-18T04:19:54.4588670Z 2022-05-18T04:19:54.4589510Z +-----------------------------------------------------------------------------+ 2022-05-18T04:19:54.4590203Z | Processes: | 2022-05-18T04:19:54.4590827Z | GPU GI CI PID Type Process name GPU Memory | 2022-05-18T04:19:54.4591448Z | ID ID Usage | 2022-05-18T04:19:54.4591962Z |=============================================================================| 2022-05-18T04:19:54.4592532Z | No running processes found | 2022-05-18T04:19:54.4593326Z +-----------------------------------------------------------------------------+ 2022-05-18T04:19:54.9652475Z + echo 'GPU_FLAG=--gpus all' 2022-05-18T04:19:55.8293006Z Command completed after 1 attempt(s). 2022-05-18T04:19:55.8293695Z 2022-05-18T04:19:55.8367633Z Prepare all required actions 2022-05-18T04:19:55.8368079Z Getting action download info 2022-05-18T04:19:55.9809353Z Download action repository 'seemethere/download-artifact-s3@v3' (SHA:64048a097659c8ca71ceacbb3c01cee9ed6f1b05) 2022-05-18T04:19:56.1478831Z Download action repository 'actions/download-artifact@v2' (SHA:f023be2c48cc18debc3bacd34cb396e0295e2869) 2022-05-18T04:19:56.2650093Z ##[group]Run ./.github/actions/download-build-artifacts 2022-05-18T04:19:56.2651192Z with: 2022-05-18T04:19:56.2651479Z name: linux-xenial-cuda11.3-py3.7-gcc7 2022-05-18T04:19:56.2651774Z env: 2022-05-18T04:19:56.2652002Z IN_CI: 1 2022-05-18T04:19:56.2652215Z IS_GHA: 1 2022-05-18T04:19:56.2652474Z GIT_DEFAULT_BRANCH: master 2022-05-18T04:19:56.2652753Z GPU_FLAG: --gpus all 2022-05-18T04:19:56.2652990Z ##[endgroup] 2022-05-18T04:19:56.2682101Z ##[group]Run seemethere/download-artifact-s3@v3 2022-05-18T04:19:56.2682412Z with: 2022-05-18T04:19:56.2682750Z name: linux-xenial-cuda11.3-py3.7-gcc7 2022-05-18T04:19:56.2683051Z s3-bucket: gha-artifacts 2022-05-18T04:19:56.2683338Z region: us-east-1 2022-05-18T04:19:56.2683582Z env: 2022-05-18T04:19:56.2683785Z IN_CI: 1 2022-05-18T04:19:56.2684020Z IS_GHA: 1 2022-05-18T04:19:56.2684282Z GIT_DEFAULT_BRANCH: master 2022-05-18T04:19:56.2684540Z GPU_FLAG: --gpus all 2022-05-18T04:19:56.2684795Z ##[endgroup] 2022-05-18T04:19:56.7805245Z Found 1 objects with prefix pytorch/pytorch/2342799944/1/linux-xenial-cuda11.3-py3.7-gcc7/ 2022-05-18T04:19:56.7806425Z Starting download (1/1): /home/ec2-user/actions-runner/_work/pytorch/pytorch/artifacts.zip 2022-05-18T04:20:13.6146399Z Finished download (1/1): /home/ec2-user/actions-runner/_work/pytorch/pytorch/artifacts.zip 2022-05-18T04:20:13.6146770Z 2022-05-18T04:20:13.6147925Z Artifact download has finished successfully 2022-05-18T04:20:13.6289064Z ##[group]Run unzip -o artifacts.zip 2022-05-18T04:20:13.6289385Z unzip -o artifacts.zip 2022-05-18T04:20:13.6303689Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2022-05-18T04:20:13.6304123Z env: 2022-05-18T04:20:13.6304346Z IN_CI: 1 2022-05-18T04:20:13.6304556Z IS_GHA: 1 2022-05-18T04:20:13.6304807Z GIT_DEFAULT_BRANCH: master 2022-05-18T04:20:13.6305079Z GPU_FLAG: --gpus all 2022-05-18T04:20:13.6305315Z ##[endgroup] 2022-05-18T04:20:13.6349983Z Archive: artifacts.zip 2022-05-18T04:20:13.6351985Z creating: dist/ 2022-05-18T04:20:16.1076534Z inflating: dist/torch-1.12.0a0+git3b23752-cp37-cp37m-linux_x86_64.whl 2022-05-18T04:20:16.1076979Z creating: build/custom_test_artifacts/ 2022-05-18T04:20:16.1077387Z creating: build/custom_test_artifacts/custom-op-build/ 2022-05-18T04:20:16.1077860Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/ 2022-05-18T04:20:16.1083502Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/CMakeOutput.log 2022-05-18T04:20:16.1084053Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.10.3/ 2022-05-18T04:20:16.1084612Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.10.3/CMakeSystem.cmake 2022-05-18T04:20:16.1085187Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.10.3/CompilerIdC/ 2022-05-18T04:20:16.1085744Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.10.3/CompilerIdC/tmp/ 2022-05-18T04:20:16.1087030Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.10.3/CompilerIdC/CMakeCCompilerId.c 2022-05-18T04:20:16.1088867Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.10.3/CompilerIdC/a.out 2022-05-18T04:20:16.1089443Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.10.3/CompilerIdCXX/ 2022-05-18T04:20:16.1090006Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.10.3/CompilerIdCXX/tmp/ 2022-05-18T04:20:16.1091688Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.10.3/CompilerIdCXX/CMakeCXXCompilerId.cpp 2022-05-18T04:20:16.1093265Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.10.3/CompilerIdCXX/a.out 2022-05-18T04:20:16.1094703Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.10.3/CMakeDetermineCompilerABI_C.bin 2022-05-18T04:20:16.1095468Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.10.3/CMakeCCompiler.cmake 2022-05-18T04:20:16.1097256Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.10.3/CMakeDetermineCompilerABI_CXX.bin 2022-05-18T04:20:16.1098006Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.10.3/CMakeCXXCompiler.cmake 2022-05-18T04:20:16.1098599Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.10.3/CompilerIdCUDA/ 2022-05-18T04:20:16.1099171Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.10.3/CompilerIdCUDA/tmp/ 2022-05-18T04:20:16.1148682Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.10.3/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp1.ii 2022-05-18T04:20:16.1149395Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.10.3/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.c 2022-05-18T04:20:16.1150123Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.10.3/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.gpu 2022-05-18T04:20:16.1150863Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.10.3/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.stub.c 2022-05-18T04:20:16.1151591Z extracting: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.10.3/CompilerIdCUDA/tmp/CMakeCUDACompilerId.module_id 2022-05-18T04:20:16.1152293Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.10.3/CompilerIdCUDA/tmp/CMakeCUDACompilerId.ptx 2022-05-18T04:20:16.1153402Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.10.3/CompilerIdCUDA/tmp/CMakeCUDACompilerId.sm_52.cubin 2022-05-18T04:20:16.1154473Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.10.3/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin 2022-05-18T04:20:16.1155464Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.10.3/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin.c 2022-05-18T04:20:16.1192857Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.10.3/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp4.ii 2022-05-18T04:20:16.1229525Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.10.3/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.cpp 2022-05-18T04:20:16.1230650Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.10.3/CompilerIdCUDA/tmp/CMakeCUDACompilerId.o 2022-05-18T04:20:16.1231481Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.10.3/CompilerIdCUDA/tmp/a_dlink.sm_52.cubin 2022-05-18T04:20:16.1232126Z extracting: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.10.3/CompilerIdCUDA/tmp/a_dlink.reg.c 2022-05-18T04:20:16.1233018Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.10.3/CompilerIdCUDA/tmp/a_dlink.fatbin 2022-05-18T04:20:16.1234008Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.10.3/CompilerIdCUDA/tmp/a_dlink.fatbin.c 2022-05-18T04:20:16.1235010Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.10.3/CompilerIdCUDA/tmp/a_dlink.o 2022-05-18T04:20:16.1236510Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.10.3/CompilerIdCUDA/CMakeCUDACompilerId.cu 2022-05-18T04:20:16.1304462Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.10.3/CompilerIdCUDA/a.out 2022-05-18T04:20:16.1372313Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.10.3/CMakeDetermineCompilerABI_CUDA.bin 2022-05-18T04:20:16.1372972Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/3.10.3/CMakeCUDACompiler.cmake 2022-05-18T04:20:16.1373534Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/CMakeTmp/ 2022-05-18T04:20:16.1374202Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/feature_tests.c 2022-05-18T04:20:16.1375089Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/feature_tests.cxx 2022-05-18T04:20:16.1377058Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/feature_tests.bin 2022-05-18T04:20:16.1377757Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/CMakeError.log 2022-05-18T04:20:16.1378334Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/cmake.check_cache 2022-05-18T04:20:16.1378884Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/ 2022-05-18T04:20:16.1402685Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/depend.make 2022-05-18T04:20:16.1403284Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/link.txt 2022-05-18T04:20:16.1403875Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/cmake_clean.cmake 2022-05-18T04:20:16.1404956Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/build.make 2022-05-18T04:20:16.1405731Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/DependInfo.cmake 2022-05-18T04:20:16.1406339Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/flags.make 2022-05-18T04:20:16.1406937Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/progress.make 2022-05-18T04:20:16.1464125Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/CXX.includecache 2022-05-18T04:20:16.1482177Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/depend.internal 2022-05-18T04:20:16.1590775Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/custom_ops.dir/op.cpp.o 2022-05-18T04:20:16.1591344Z creating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/ 2022-05-18T04:20:16.1618293Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/depend.make 2022-05-18T04:20:16.1619034Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/link.txt 2022-05-18T04:20:16.1619649Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/cmake_clean.cmake 2022-05-18T04:20:16.1620609Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/build.make 2022-05-18T04:20:16.1621248Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/DependInfo.cmake 2022-05-18T04:20:16.1621940Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/flags.make 2022-05-18T04:20:16.1622548Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/progress.make 2022-05-18T04:20:16.1680396Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/CXX.includecache 2022-05-18T04:20:16.1698563Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/depend.internal 2022-05-18T04:20:16.1777697Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/test_custom_ops.dir/test_custom_ops.cpp.o 2022-05-18T04:20:16.1778339Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/CMakeDirectoryInformation.cmake 2022-05-18T04:20:16.1778953Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/TargetDirectories.txt 2022-05-18T04:20:16.1779518Z extracting: build/custom_test_artifacts/custom-op-build/CMakeFiles/progress.marks 2022-05-18T04:20:16.1780325Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/Makefile2 2022-05-18T04:20:16.1782062Z inflating: build/custom_test_artifacts/custom-op-build/CMakeFiles/Makefile.cmake 2022-05-18T04:20:16.1782601Z inflating: build/custom_test_artifacts/custom-op-build/detect_cuda_version.cc 2022-05-18T04:20:16.1785214Z inflating: build/custom_test_artifacts/custom-op-build/CMakeCache.txt 2022-05-18T04:20:16.1786020Z inflating: build/custom_test_artifacts/custom-op-build/Makefile 2022-05-18T04:20:16.1786659Z inflating: build/custom_test_artifacts/custom-op-build/cmake_install.cmake 2022-05-18T04:20:16.1875881Z inflating: build/custom_test_artifacts/custom-op-build/libcustom_ops.so 2022-05-18T04:20:16.1936755Z inflating: build/custom_test_artifacts/custom-op-build/test_custom_ops 2022-05-18T04:20:16.1937212Z creating: build/custom_test_artifacts/jit-hook-build/ 2022-05-18T04:20:16.1937667Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/ 2022-05-18T04:20:16.1942920Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/CMakeOutput.log 2022-05-18T04:20:16.1943460Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.10.3/ 2022-05-18T04:20:16.1943991Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.10.3/CMakeSystem.cmake 2022-05-18T04:20:16.1944551Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.10.3/CompilerIdC/ 2022-05-18T04:20:16.1945106Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.10.3/CompilerIdC/tmp/ 2022-05-18T04:20:16.1946352Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.10.3/CompilerIdC/CMakeCCompilerId.c 2022-05-18T04:20:16.1947935Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.10.3/CompilerIdC/a.out 2022-05-18T04:20:16.1948497Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.10.3/CompilerIdCXX/ 2022-05-18T04:20:16.1949056Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.10.3/CompilerIdCXX/tmp/ 2022-05-18T04:20:16.1950937Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.10.3/CompilerIdCXX/CMakeCXXCompilerId.cpp 2022-05-18T04:20:16.1952036Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.10.3/CompilerIdCXX/a.out 2022-05-18T04:20:16.1953882Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.10.3/CMakeDetermineCompilerABI_C.bin 2022-05-18T04:20:16.1954631Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.10.3/CMakeCCompiler.cmake 2022-05-18T04:20:16.1955878Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.10.3/CMakeDetermineCompilerABI_CXX.bin 2022-05-18T04:20:16.1956891Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.10.3/CMakeCXXCompiler.cmake 2022-05-18T04:20:16.1957469Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.10.3/CompilerIdCUDA/ 2022-05-18T04:20:16.1958030Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.10.3/CompilerIdCUDA/tmp/ 2022-05-18T04:20:16.2007782Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.10.3/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp1.ii 2022-05-18T04:20:16.2008491Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.10.3/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.c 2022-05-18T04:20:16.2009205Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.10.3/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.gpu 2022-05-18T04:20:16.2009941Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.10.3/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.stub.c 2022-05-18T04:20:16.2010862Z extracting: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.10.3/CompilerIdCUDA/tmp/CMakeCUDACompilerId.module_id 2022-05-18T04:20:16.2011536Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.10.3/CompilerIdCUDA/tmp/CMakeCUDACompilerId.ptx 2022-05-18T04:20:16.2012227Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.10.3/CompilerIdCUDA/tmp/CMakeCUDACompilerId.sm_52.cubin 2022-05-18T04:20:16.2012915Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.10.3/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin 2022-05-18T04:20:16.2013956Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.10.3/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin.c 2022-05-18T04:20:16.2051270Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.10.3/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp4.ii 2022-05-18T04:20:16.2087876Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.10.3/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.cpp 2022-05-18T04:20:16.2088892Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.10.3/CompilerIdCUDA/tmp/CMakeCUDACompilerId.o 2022-05-18T04:20:16.2089944Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.10.3/CompilerIdCUDA/tmp/a_dlink.sm_52.cubin 2022-05-18T04:20:16.2090871Z extracting: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.10.3/CompilerIdCUDA/tmp/a_dlink.reg.c 2022-05-18T04:20:16.2091515Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.10.3/CompilerIdCUDA/tmp/a_dlink.fatbin 2022-05-18T04:20:16.2092379Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.10.3/CompilerIdCUDA/tmp/a_dlink.fatbin.c 2022-05-18T04:20:16.2093352Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.10.3/CompilerIdCUDA/tmp/a_dlink.o 2022-05-18T04:20:16.2094763Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.10.3/CompilerIdCUDA/CMakeCUDACompilerId.cu 2022-05-18T04:20:16.2162587Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.10.3/CompilerIdCUDA/a.out 2022-05-18T04:20:16.2230514Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.10.3/CMakeDetermineCompilerABI_CUDA.bin 2022-05-18T04:20:16.2231163Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/3.10.3/CMakeCUDACompiler.cmake 2022-05-18T04:20:16.2231730Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/CMakeTmp/ 2022-05-18T04:20:16.2232267Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/feature_tests.c 2022-05-18T04:20:16.2233253Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/feature_tests.cxx 2022-05-18T04:20:16.2235369Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/feature_tests.bin 2022-05-18T04:20:16.2235923Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/CMakeError.log 2022-05-18T04:20:16.2236477Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/cmake.check_cache 2022-05-18T04:20:16.2237027Z creating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/ 2022-05-18T04:20:16.2263900Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/depend.make 2022-05-18T04:20:16.2264500Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/link.txt 2022-05-18T04:20:16.2265106Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/cmake_clean.cmake 2022-05-18T04:20:16.2266125Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/build.make 2022-05-18T04:20:16.2266776Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/DependInfo.cmake 2022-05-18T04:20:16.2267458Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/flags.make 2022-05-18T04:20:16.2268169Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/progress.make 2022-05-18T04:20:16.2325321Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/CXX.includecache 2022-05-18T04:20:16.2344229Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/depend.internal 2022-05-18T04:20:16.2406614Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/test_jit_hooks.dir/test_jit_hooks.cpp.o 2022-05-18T04:20:16.2407254Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/CMakeDirectoryInformation.cmake 2022-05-18T04:20:16.2407977Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/TargetDirectories.txt 2022-05-18T04:20:16.2408565Z extracting: build/custom_test_artifacts/jit-hook-build/CMakeFiles/progress.marks 2022-05-18T04:20:16.2409247Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/Makefile2 2022-05-18T04:20:16.2410950Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeFiles/Makefile.cmake 2022-05-18T04:20:16.2411487Z inflating: build/custom_test_artifacts/jit-hook-build/detect_cuda_version.cc 2022-05-18T04:20:16.2413983Z inflating: build/custom_test_artifacts/jit-hook-build/CMakeCache.txt 2022-05-18T04:20:16.2414557Z inflating: build/custom_test_artifacts/jit-hook-build/Makefile 2022-05-18T04:20:16.2415406Z inflating: build/custom_test_artifacts/jit-hook-build/cmake_install.cmake 2022-05-18T04:20:16.2463662Z inflating: build/custom_test_artifacts/jit-hook-build/test_jit_hooks 2022-05-18T04:20:16.2464146Z creating: build/custom_test_artifacts/custom-backend-build/ 2022-05-18T04:20:16.2464625Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/ 2022-05-18T04:20:16.2469881Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/CMakeOutput.log 2022-05-18T04:20:16.2470448Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.10.3/ 2022-05-18T04:20:16.2471008Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.10.3/CMakeSystem.cmake 2022-05-18T04:20:16.2471595Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.10.3/CompilerIdC/ 2022-05-18T04:20:16.2472173Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.10.3/CompilerIdC/tmp/ 2022-05-18T04:20:16.2473303Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.10.3/CompilerIdC/CMakeCCompilerId.c 2022-05-18T04:20:16.2474764Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.10.3/CompilerIdC/a.out 2022-05-18T04:20:16.2475357Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.10.3/CompilerIdCXX/ 2022-05-18T04:20:16.2475949Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.10.3/CompilerIdCXX/tmp/ 2022-05-18T04:20:16.2477986Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.10.3/CompilerIdCXX/CMakeCXXCompilerId.cpp 2022-05-18T04:20:16.2479952Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.10.3/CompilerIdCXX/a.out 2022-05-18T04:20:16.2481203Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.10.3/CMakeDetermineCompilerABI_C.bin 2022-05-18T04:20:16.2482067Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.10.3/CMakeCCompiler.cmake 2022-05-18T04:20:16.2483686Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.10.3/CMakeDetermineCompilerABI_CXX.bin 2022-05-18T04:20:16.2484588Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.10.3/CMakeCXXCompiler.cmake 2022-05-18T04:20:16.2485188Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.10.3/CompilerIdCUDA/ 2022-05-18T04:20:16.2485783Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.10.3/CompilerIdCUDA/tmp/ 2022-05-18T04:20:16.2535615Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.10.3/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp1.ii 2022-05-18T04:20:16.2536364Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.10.3/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.c 2022-05-18T04:20:16.2537099Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.10.3/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.gpu 2022-05-18T04:20:16.2537854Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.10.3/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.stub.c 2022-05-18T04:20:16.2538705Z extracting: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.10.3/CompilerIdCUDA/tmp/CMakeCUDACompilerId.module_id 2022-05-18T04:20:16.2539447Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.10.3/CompilerIdCUDA/tmp/CMakeCUDACompilerId.ptx 2022-05-18T04:20:16.2540170Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.10.3/CompilerIdCUDA/tmp/CMakeCUDACompilerId.sm_52.cubin 2022-05-18T04:20:16.2540868Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.10.3/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin 2022-05-18T04:20:16.2541594Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.10.3/CompilerIdCUDA/tmp/CMakeCUDACompilerId.fatbin.c 2022-05-18T04:20:16.2579436Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.10.3/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cpp4.ii 2022-05-18T04:20:16.2618325Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.10.3/CompilerIdCUDA/tmp/CMakeCUDACompilerId.cudafe1.cpp 2022-05-18T04:20:16.2619322Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.10.3/CompilerIdCUDA/tmp/CMakeCUDACompilerId.o 2022-05-18T04:20:16.2620281Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.10.3/CompilerIdCUDA/tmp/a_dlink.sm_52.cubin 2022-05-18T04:20:16.2620972Z extracting: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.10.3/CompilerIdCUDA/tmp/a_dlink.reg.c 2022-05-18T04:20:16.2621624Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.10.3/CompilerIdCUDA/tmp/a_dlink.fatbin 2022-05-18T04:20:16.2622543Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.10.3/CompilerIdCUDA/tmp/a_dlink.fatbin.c 2022-05-18T04:20:16.2623463Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.10.3/CompilerIdCUDA/tmp/a_dlink.o 2022-05-18T04:20:16.2624901Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.10.3/CompilerIdCUDA/CMakeCUDACompilerId.cu 2022-05-18T04:20:16.2693782Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.10.3/CompilerIdCUDA/a.out 2022-05-18T04:20:16.2761880Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.10.3/CMakeDetermineCompilerABI_CUDA.bin 2022-05-18T04:20:16.2762550Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/3.10.3/CMakeCUDACompiler.cmake 2022-05-18T04:20:16.2763140Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/CMakeTmp/ 2022-05-18T04:20:16.2763702Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/feature_tests.c 2022-05-18T04:20:16.2764614Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/feature_tests.cxx 2022-05-18T04:20:16.2766806Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/feature_tests.bin 2022-05-18T04:20:16.2767393Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/CMakeError.log 2022-05-18T04:20:16.2767979Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/cmake.check_cache 2022-05-18T04:20:16.2768558Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/ 2022-05-18T04:20:16.2797542Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/depend.make 2022-05-18T04:20:16.2798188Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/link.txt 2022-05-18T04:20:16.2798837Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/cmake_clean.cmake 2022-05-18T04:20:16.2799857Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/build.make 2022-05-18T04:20:16.2800705Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/DependInfo.cmake 2022-05-18T04:20:16.2801379Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/flags.make 2022-05-18T04:20:16.2802156Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/progress.make 2022-05-18T04:20:16.2860593Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/CXX.includecache 2022-05-18T04:20:16.2878605Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/depend.internal 2022-05-18T04:20:16.2934967Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/test_custom_backend.dir/test_custom_backend.cpp.o 2022-05-18T04:20:16.2935596Z creating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/ 2022-05-18T04:20:16.2940501Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/depend.make 2022-05-18T04:20:16.2941114Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/link.txt 2022-05-18T04:20:16.2941758Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/cmake_clean.cmake 2022-05-18T04:20:16.2942635Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/build.make 2022-05-18T04:20:16.2943443Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/DependInfo.cmake 2022-05-18T04:20:16.2944089Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/flags.make 2022-05-18T04:20:16.2944867Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/progress.make 2022-05-18T04:20:16.2952501Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/CXX.includecache 2022-05-18T04:20:16.2956201Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/depend.internal 2022-05-18T04:20:16.3099885Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/custom_backend.dir/custom_backend.cpp.o 2022-05-18T04:20:16.3100555Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/CMakeDirectoryInformation.cmake 2022-05-18T04:20:16.3101338Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/TargetDirectories.txt 2022-05-18T04:20:16.3101911Z extracting: build/custom_test_artifacts/custom-backend-build/CMakeFiles/progress.marks 2022-05-18T04:20:16.3102779Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/Makefile2 2022-05-18T04:20:16.3104079Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeFiles/Makefile.cmake 2022-05-18T04:20:16.3104640Z inflating: build/custom_test_artifacts/custom-backend-build/detect_cuda_version.cc 2022-05-18T04:20:16.3107444Z inflating: build/custom_test_artifacts/custom-backend-build/CMakeCache.txt 2022-05-18T04:20:16.3108261Z inflating: build/custom_test_artifacts/custom-backend-build/Makefile 2022-05-18T04:20:16.3108978Z inflating: build/custom_test_artifacts/custom-backend-build/cmake_install.cmake 2022-05-18T04:20:16.3226313Z inflating: build/custom_test_artifacts/custom-backend-build/libcustom_backend.so 2022-05-18T04:20:16.3270968Z inflating: build/custom_test_artifacts/custom-backend-build/test_custom_backend 2022-05-18T04:20:16.3271337Z creating: build/lib/ 2022-05-18T04:20:16.3272000Z inflating: build/lib/libclog.a 2022-05-18T04:20:16.3336598Z inflating: build/lib/libgtest.a 2022-05-18T04:20:16.3346773Z inflating: build/lib/libpthreadpool.a 2022-05-18T04:20:16.3434475Z inflating: build/lib/libbenchmark.a 2022-05-18T04:20:16.3540317Z inflating: build/lib/libprotobuf-lite.a 2022-05-18T04:20:16.3572264Z inflating: build/lib/libtensorpipe_uv.a 2022-05-18T04:20:16.3628103Z inflating: build/lib/libasmjit.a 2022-05-18T04:20:16.3760505Z inflating: build/lib/libgloo.a 2022-05-18T04:20:16.4292729Z inflating: build/lib/libprotobuf.a 2022-05-18T04:20:16.4312145Z inflating: build/lib/libfmt.a 2022-05-18T04:20:16.4313046Z inflating: build/lib/libfoxi_loader.a 2022-05-18T04:20:16.4379592Z inflating: build/lib/libc10.so 2022-05-18T04:20:16.4380679Z inflating: build/lib/libtorch_global_deps.so 2022-05-18T04:20:16.4382716Z inflating: build/lib/libcaffe2_nvrtc.so 2022-05-18T04:20:16.4392270Z inflating: build/lib/libcpuinfo.a 2022-05-18T04:20:16.4401199Z inflating: build/lib/libcpuinfo_internals.a 2022-05-18T04:20:16.4417207Z inflating: build/lib/libqnnpack.a 2022-05-18T04:20:16.4985766Z inflating: build/lib/libprotoc.a 2022-05-18T04:20:16.5009604Z inflating: build/lib/libpytorch_qnnpack.a 2022-05-18T04:20:16.5012371Z inflating: build/lib/libnnpack_reference_layers.a 2022-05-18T04:20:16.5030776Z inflating: build/lib/libgmock.a 2022-05-18T04:20:16.5031352Z inflating: build/lib/libgtest_main.a 2022-05-18T04:20:16.5032303Z inflating: build/lib/libbenchmark_main.a 2022-05-18T04:20:17.3136061Z inflating: build/lib/libdnnl.a 2022-05-18T04:20:17.3158950Z inflating: build/lib/libnnpack.a 2022-05-18T04:20:17.3812985Z inflating: build/lib/libtensorpipe.a 2022-05-18T04:20:17.3857153Z inflating: build/lib/libc10_cuda.so 2022-05-18T04:20:17.5377634Z inflating: build/lib/libfbgemm.a 2022-05-18T04:20:17.5378443Z inflating: build/lib/libgmock_main.a 2022-05-18T04:20:17.5805973Z inflating: build/lib/libkineto.a 2022-05-18T04:20:17.6934167Z inflating: build/lib/libdnnl_graph.a 2022-05-18T04:20:17.6979516Z inflating: build/lib/libcaffe2_protos.a 2022-05-18T04:20:17.7027297Z inflating: build/lib/libonnx_proto.a 2022-05-18T04:20:17.7317333Z inflating: build/lib/libtensorpipe_cuda.a 2022-05-18T04:20:17.7980690Z inflating: build/lib/libonnx.a 2022-05-18T04:20:17.8394284Z inflating: build/lib/libgloo_cuda.a 2022-05-18T04:20:17.8408812Z inflating: build/lib/libtest_deploy_lib.so 2022-05-18T04:20:18.3450436Z inflating: build/lib/libtorch_python_static.a 2022-05-18T04:20:19.0148836Z inflating: build/lib/libtorch_deployinterpreter.so 2022-05-18T04:20:21.1069812Z inflating: build/lib/libtorch_cpu.so 2022-05-18T04:20:21.5485106Z inflating: build/lib/libtorch_cuda_cpp.so 2022-05-18T04:20:23.1488669Z inflating: build/lib/libtorch_cuda_cu.so 2022-05-18T04:20:23.1489416Z inflating: build/lib/libtorch_cuda.so 2022-05-18T04:20:23.1491422Z inflating: build/lib/libtorch.so 2022-05-18T04:20:23.1494931Z inflating: build/lib/libc10d_cuda_test.so 2022-05-18T04:20:24.1353465Z inflating: build/lib/libtorch_cuda_linalg.so 2022-05-18T04:20:24.1376923Z inflating: build/lib/libjitbackend_test.so 2022-05-18T04:20:24.1407486Z inflating: build/lib/libbackend_with_compiler.so 2022-05-18T04:20:24.1460623Z inflating: build/lib/libtorchbind_test.so 2022-05-18T04:20:24.1465372Z inflating: build/lib/libshm.so 2022-05-18T04:20:24.8216122Z inflating: build/lib/libtorch_deploy_internal.a 2022-05-18T04:20:24.9804033Z inflating: build/lib/libtorch_python.so 2022-05-18T04:20:24.9842027Z inflating: build/lib/libnnapi_backend.so 2022-05-18T04:20:24.9842352Z creating: build/bin/ 2022-05-18T04:20:24.9854482Z inflating: build/bin/remove_dt_needed 2022-05-18T04:20:24.9910833Z inflating: build/bin/c10_registry_test 2022-05-18T04:20:24.9987906Z inflating: build/bin/c10_optional_test 2022-05-18T04:20:25.0159383Z inflating: build/bin/c10_intrusive_ptr_test 2022-05-18T04:20:25.0210654Z inflating: build/bin/c10_flags_test 2022-05-18T04:20:25.0264671Z inflating: build/bin/c10_exception_test 2022-05-18T04:20:25.0323915Z inflating: build/bin/c10_logging_test 2022-05-18T04:20:25.0436372Z inflating: build/bin/c10_either_test 2022-05-18T04:20:25.0493544Z inflating: build/bin/c10_complex_test 2022-05-18T04:20:25.0545349Z inflating: build/bin/c10_irange_test 2022-05-18T04:20:25.0602420Z inflating: build/bin/c10_bfloat16_test 2022-05-18T04:20:25.0663155Z inflating: build/bin/c10_string_view_test 2022-05-18T04:20:25.0716273Z inflating: build/bin/c10_accumulate_test 2022-05-18T04:20:25.0772322Z inflating: build/bin/c10_complex_math_test 2022-05-18T04:20:25.0826609Z inflating: build/bin/c10_Bitset_test 2022-05-18T04:20:25.0884096Z inflating: build/bin/c10_InlineStreamGuard_test 2022-05-18T04:20:25.1032181Z inflating: build/bin/c10_SmallVectorTest 2022-05-18T04:20:25.1089150Z inflating: build/bin/c10_InlineDeviceGuard_test 2022-05-18T04:20:25.1147225Z inflating: build/bin/c10_typeid_test 2022-05-18T04:20:25.1205644Z inflating: build/bin/c10_SizesAndStrides_test 2022-05-18T04:20:25.1258515Z inflating: build/bin/c10_tempfile_test 2022-05-18T04:20:25.1308885Z inflating: build/bin/c10_CompileTimeFunctionPointer_test 2022-05-18T04:20:25.1358207Z inflating: build/bin/c10_StreamGuard_test 2022-05-18T04:20:25.1421640Z inflating: build/bin/c10_ordered_preserving_dict_test 2022-05-18T04:20:25.1481129Z inflating: build/bin/c10_DispatchKeySet_test 2022-05-18T04:20:25.1540119Z inflating: build/bin/c10_ThreadLocal_test 2022-05-18T04:20:25.1593397Z inflating: build/bin/c10_DeviceGuard_test 2022-05-18T04:20:25.1645746Z inflating: build/bin/c10_C++17_test 2022-05-18T04:20:25.1697753Z inflating: build/bin/c10_Device_test 2022-05-18T04:20:25.1747006Z inflating: build/bin/c10_TypeTraits_test 2022-05-18T04:20:25.1797898Z inflating: build/bin/c10_DeadlockDetection_test 2022-05-18T04:20:25.1848984Z inflating: build/bin/c10_Half_test 2022-05-18T04:20:25.1906870Z inflating: build/bin/c10_LeftRight_test 2022-05-18T04:20:25.1956077Z inflating: build/bin/c10_ConstexprCrc_test 2022-05-18T04:20:25.2020169Z inflating: build/bin/c10_Metaprogramming_test 2022-05-18T04:20:25.2069472Z inflating: build/bin/c10_Array_test 2022-05-18T04:20:25.2120915Z inflating: build/bin/c10_Synchronized_test 2022-05-18T04:20:25.2175066Z inflating: build/bin/c10_TypeIndex_test 2022-05-18T04:20:25.2226866Z inflating: build/bin/c10_TypeList_test 2022-05-18T04:20:25.2284445Z inflating: build/bin/c10_intrusive_ptr_benchmark 2022-05-18T04:20:25.2800721Z inflating: build/bin/protoc-3.13.0.0 2022-05-18T04:20:25.3316417Z inflating: build/bin/protoc 2022-05-18T04:20:25.3366153Z inflating: build/bin/c10_cuda_CUDATest 2022-05-18T04:20:25.3674168Z inflating: build/bin/vec_test_all_types_DEFAULT 2022-05-18T04:20:25.4018072Z inflating: build/bin/vec_test_all_types_AVX2 2022-05-18T04:20:25.4073216Z inflating: build/bin/HashStoreTest 2022-05-18T04:20:25.4128026Z inflating: build/bin/FileStoreTest 2022-05-18T04:20:25.4190500Z inflating: build/bin/TCPStoreTest 2022-05-18T04:20:25.4205129Z inflating: build/bin/ProcessGroupMPITest 2022-05-18T04:20:25.4274937Z inflating: build/bin/cuda_cub_test 2022-05-18T04:20:25.4278033Z inflating: build/bin/example_allreduce 2022-05-18T04:20:25.4328238Z inflating: build/bin/cuda_cudnn_test 2022-05-18T04:20:25.4391044Z inflating: build/bin/cuda_stream_test 2022-05-18T04:20:25.4444764Z inflating: build/bin/cuda_apply_test 2022-05-18T04:20:25.4498935Z inflating: build/bin/cuda_reportMemoryUsage_test 2022-05-18T04:20:25.4569399Z inflating: build/bin/cuda_complex_math_test 2022-05-18T04:20:25.4630002Z inflating: build/bin/cuda_atomic_ops_test 2022-05-18T04:20:25.4684271Z inflating: build/bin/inline_container_test 2022-05-18T04:20:25.4733766Z inflating: build/bin/op_allowlist_test 2022-05-18T04:20:25.4788715Z inflating: build/bin/cuda_caching_host_allocator_test 2022-05-18T04:20:25.4843424Z inflating: build/bin/cuda_vectorized_test 2022-05-18T04:20:25.4894015Z inflating: build/bin/cuda_optional_test 2022-05-18T04:20:25.4955397Z inflating: build/bin/apply_utils_test 2022-05-18T04:20:25.5025656Z inflating: build/bin/cuda_distributions_test 2022-05-18T04:20:25.5078433Z inflating: build/bin/cuda_integer_divider_test 2022-05-18T04:20:25.5137702Z inflating: build/bin/quantized_test 2022-05-18T04:20:25.5198040Z inflating: build/bin/cuda_generator_test 2022-05-18T04:20:25.5248902Z inflating: build/bin/variant_test 2022-05-18T04:20:25.5309499Z inflating: build/bin/cpu_generator_test 2022-05-18T04:20:25.5360869Z inflating: build/bin/dispatch_key_set_test 2022-05-18T04:20:25.5423232Z inflating: build/bin/type_test 2022-05-18T04:20:25.5553661Z inflating: build/bin/kernel_lambda_legacy_test 2022-05-18T04:20:25.5652204Z inflating: build/bin/make_boxed_from_unboxed_functor_test 2022-05-18T04:20:25.5709801Z inflating: build/bin/test_parallel 2022-05-18T04:20:25.5764019Z inflating: build/bin/cpu_profiling_allocator_test 2022-05-18T04:20:25.5816442Z inflating: build/bin/reportMemoryUsage_test 2022-05-18T04:20:25.5867243Z inflating: build/bin/reduce_ops_test 2022-05-18T04:20:25.5926630Z inflating: build/bin/scalar_test 2022-05-18T04:20:25.6025902Z inflating: build/bin/kernel_function_test 2022-05-18T04:20:25.6323287Z inflating: build/bin/op_registration_test 2022-05-18T04:20:25.6398072Z inflating: build/bin/Dict_test 2022-05-18T04:20:25.6452019Z inflating: build/bin/Dimname_test 2022-05-18T04:20:25.6513693Z inflating: build/bin/basic 2022-05-18T04:20:25.6566658Z inflating: build/bin/memory_overlapping_test 2022-05-18T04:20:25.6618083Z inflating: build/bin/cuda_half_test 2022-05-18T04:20:25.6671527Z inflating: build/bin/cuda_packedtensoraccessor_test 2022-05-18T04:20:25.6738925Z inflating: build/bin/pow_test 2022-05-18T04:20:25.6740247Z inflating: build/bin/verify_api_visibility 2022-05-18T04:20:25.6801444Z inflating: build/bin/cuda_complex_test 2022-05-18T04:20:25.6860771Z inflating: build/bin/NamedTensor_test 2022-05-18T04:20:25.6913145Z inflating: build/bin/weakref_test 2022-05-18T04:20:25.6971315Z inflating: build/bin/extension_backend_test 2022-05-18T04:20:25.7028793Z inflating: build/bin/half_test 2022-05-18T04:20:25.7081098Z inflating: build/bin/wrapdim_test 2022-05-18T04:20:25.7137560Z inflating: build/bin/broadcast_test 2022-05-18T04:20:25.7188700Z inflating: build/bin/dlconvertor_test 2022-05-18T04:20:25.7246664Z inflating: build/bin/scalar_tensor_test 2022-05-18T04:20:25.7304188Z inflating: build/bin/native_test 2022-05-18T04:20:25.7357894Z inflating: build/bin/undefined_tensor_test 2022-05-18T04:20:25.7470771Z inflating: build/bin/List_test 2022-05-18T04:20:25.7550244Z inflating: build/bin/tensor_iterator_test 2022-05-18T04:20:25.7602078Z inflating: build/bin/CppSignature_test 2022-05-18T04:20:25.7604705Z inflating: build/bin/thread_init_test 2022-05-18T04:20:25.7659137Z inflating: build/bin/math_kernel_test 2022-05-18T04:20:25.7709591Z inflating: build/bin/lazy_tensor_test 2022-05-18T04:20:25.7763555Z inflating: build/bin/memory_format_test 2022-05-18T04:20:25.7815441Z inflating: build/bin/operators_test 2022-05-18T04:20:25.7866657Z inflating: build/bin/cuda_dlconvertor_test 2022-05-18T04:20:25.7974286Z inflating: build/bin/kernel_lambda_test 2022-05-18T04:20:25.8044328Z inflating: build/bin/vmap_test 2022-05-18T04:20:25.8105889Z inflating: build/bin/IListRef_test 2022-05-18T04:20:25.8203450Z inflating: build/bin/ivalue_test 2022-05-18T04:20:25.8252855Z inflating: build/bin/mobile_memory_cleanup 2022-05-18T04:20:25.8314439Z inflating: build/bin/kernel_stackbased_test 2022-05-18T04:20:25.8367304Z inflating: build/bin/stride_properties_test 2022-05-18T04:20:25.8427854Z inflating: build/bin/atest 2022-05-18T04:20:25.8493946Z inflating: build/bin/KernelFunction_test 2022-05-18T04:20:25.8552034Z inflating: build/bin/backend_fallback_test 2022-05-18T04:20:25.8643170Z inflating: build/bin/cpu_rng_test 2022-05-18T04:20:25.8766863Z inflating: build/bin/kernel_function_legacy_test 2022-05-18T04:20:25.8833952Z inflating: build/bin/ProcessGroupGlooTest 2022-05-18T04:20:25.8895026Z inflating: build/bin/ProcessGroupGlooAsyncTest 2022-05-18T04:20:25.8957982Z inflating: build/bin/ProcessGroupNCCLTest 2022-05-18T04:20:25.8975139Z inflating: build/bin/tutorial_tensorexpr 2022-05-18T04:20:25.9035846Z inflating: build/bin/ProcessGroupNCCLErrorsTest 2022-05-18T04:20:25.9090527Z inflating: build/bin/test_dist_autograd 2022-05-18T04:20:25.9161789Z inflating: build/bin/test_mobile_nnc 2022-05-18T04:20:25.9164427Z inflating: build/bin/parallel_benchmark 2022-05-18T04:20:25.9236594Z inflating: build/bin/test_cpp_rpc 2022-05-18T04:20:25.9242346Z inflating: build/bin/torch_shm_manager 2022-05-18T04:20:25.9253659Z inflating: build/bin/aot_model_compiler_test 2022-05-18T04:20:25.9617267Z inflating: build/bin/test_lazy 2022-05-18T04:20:26.0492881Z inflating: build/bin/test_tensorexpr 2022-05-18T04:20:26.7231797Z inflating: build/bin/interactive_embedded_interpreter 2022-05-18T04:20:26.7360981Z inflating: build/bin/nvfuser_bench 2022-05-18T04:20:27.4151046Z inflating: build/bin/test_deploy 2022-05-18T04:20:28.0922448Z inflating: build/bin/test_deploy_gpu 2022-05-18T04:20:28.7652137Z inflating: build/bin/deploy_benchmark 2022-05-18T04:20:28.8575848Z inflating: build/bin/test_jit 2022-05-18T04:20:29.6543908Z inflating: build/bin/test_api 2022-05-18T04:20:29.6545013Z inflating: .pytorch-test-times.json 2022-05-18T04:20:29.6573758Z ##[group]Run df -H 2022-05-18T04:20:29.6574024Z df -H 2022-05-18T04:20:29.6587425Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2022-05-18T04:20:29.6587733Z env: 2022-05-18T04:20:29.6587959Z IN_CI: 1 2022-05-18T04:20:29.6588169Z IS_GHA: 1 2022-05-18T04:20:29.6588422Z GIT_DEFAULT_BRANCH: master 2022-05-18T04:20:29.6588694Z GPU_FLAG: --gpus all 2022-05-18T04:20:29.6588929Z ##[endgroup] 2022-05-18T04:20:29.6627338Z Filesystem Size Used Avail Use% Mounted on 2022-05-18T04:20:29.6627849Z devtmpfs 129G 0 129G 0% /dev 2022-05-18T04:20:29.6628135Z tmpfs 129G 6.8M 129G 1% /dev/shm 2022-05-18T04:20:29.6628426Z tmpfs 129G 533k 129G 1% /run 2022-05-18T04:20:29.6629144Z tmpfs 129G 0 129G 0% /sys/fs/cgroup 2022-05-18T04:20:29.6629455Z /dev/xvda1 162G 27G 135G 17% / 2022-05-18T04:20:29.6651081Z ##[group]Run .github/scripts/parse_ref.py 2022-05-18T04:20:29.6651474Z .github/scripts/parse_ref.py 2022-05-18T04:20:29.6663950Z shell: /usr/bin/bash -e {0} 2022-05-18T04:20:29.6664200Z env: 2022-05-18T04:20:29.6664428Z IN_CI: 1 2022-05-18T04:20:29.6664663Z IS_GHA: 1 2022-05-18T04:20:29.6664908Z GIT_DEFAULT_BRANCH: master 2022-05-18T04:20:29.6665193Z GPU_FLAG: --gpus all 2022-05-18T04:20:29.6665454Z ##[endgroup] 2022-05-18T04:20:29.7015183Z ##[group]Run set -x 2022-05-18T04:20:29.7015590Z set -x 2022-05-18T04:20:29.7015998Z  2022-05-18T04:20:29.7016298Z if [[ $TEST_CONFIG == 'multigpu' ]]; then 2022-05-18T04:20:29.7016641Z  TEST_COMMAND=.jenkins/pytorch/multigpu-test.sh 2022-05-18T04:20:29.7016998Z elif [[ $BUILD_ENVIRONMENT == *onnx* ]]; then 2022-05-18T04:20:29.7017333Z  TEST_COMMAND=.jenkins/caffe2/test.sh 2022-05-18T04:20:29.7017595Z else 2022-05-18T04:20:29.7017879Z  TEST_COMMAND=.jenkins/pytorch/test.sh 2022-05-18T04:20:29.7018171Z fi 2022-05-18T04:20:29.7018399Z  2022-05-18T04:20:29.7018711Z COMMIT_MESSAGES=$(git cherry -v "origin/${GIT_DEFAULT_BRANCH:-master}") 2022-05-18T04:20:29.7019067Z export COMMIT_MESSAGES 2022-05-18T04:20:29.7019324Z  2022-05-18T04:20:29.7019625Z # detached container should get cleaned up by teardown_ec2_linux 2022-05-18T04:20:29.7020062Z # TODO: Stop building test binaries as part of the build phase 2022-05-18T04:20:29.7020449Z # Used for GPU_FLAG since that doesn't play nice 2022-05-18T04:20:29.7020770Z # shellcheck disable=SC2086,SC2090 2022-05-18T04:20:29.7021079Z container_name=$(docker run \ 2022-05-18T04:20:29.7021367Z  ${GPU_FLAG:-} \ 2022-05-18T04:20:29.7021630Z  -e BUILD_ENVIRONMENT \ 2022-05-18T04:20:29.7021911Z  -e PR_NUMBER \ 2022-05-18T04:20:29.7022214Z  -e CUSTOM_TEST_ARTIFACT_BUILD_DIR \ 2022-05-18T04:20:29.7022502Z  -e GITHUB_ACTIONS \ 2022-05-18T04:20:29.7022767Z  -e IN_CI \ 2022-05-18T04:20:29.7023020Z  -e IS_GHA \ 2022-05-18T04:20:29.7023253Z  -e BRANCH \ 2022-05-18T04:20:29.7023505Z  -e SHA1 \ 2022-05-18T04:20:29.7023774Z  -e AWS_DEFAULT_REGION \ 2022-05-18T04:20:29.7024053Z  -e IN_WHEEL_TEST \ 2022-05-18T04:20:29.7024311Z  -e SHARD_NUMBER \ 2022-05-18T04:20:29.7024580Z  -e JOB_BASE_NAME \ 2022-05-18T04:20:29.7024847Z  -e TEST_CONFIG \ 2022-05-18T04:20:29.7025103Z  -e NUM_TEST_SHARDS \ 2022-05-18T04:20:29.7025372Z  -e PR_BODY \ 2022-05-18T04:20:29.7025642Z  -e COMMIT_MESSAGES \ 2022-05-18T04:20:29.7025921Z  -e PYTORCH_RETRY_TEST_CASES \ 2022-05-18T04:20:29.7026207Z  -e PR_LABELS \ 2022-05-18T04:20:29.7026502Z  -e MAX_JOBS="$(nproc --ignore=2)" \ 2022-05-18T04:20:29.7026784Z  -e SCCACHE_BUCKET \ 2022-05-18T04:20:29.7027057Z  -e XLA_CUDA \ 2022-05-18T04:20:29.7027405Z  -e XLA_CLANG_CACHE_S3_BUCKET_NAME \ 2022-05-18T04:20:29.7027743Z  --env-file="/tmp/github_env_${GITHUB_RUN_ID}" \ 2022-05-18T04:20:29.7028077Z  --ulimit stack=10485760:83886080 \ 2022-05-18T04:20:29.7028399Z  --security-opt seccomp=unconfined \ 2022-05-18T04:20:29.7028719Z  --cap-add=SYS_PTRACE \ 2022-05-18T04:20:29.7028976Z  --ipc=host \ 2022-05-18T04:20:29.7029256Z  --shm-size="${SHM_SIZE}" \ 2022-05-18T04:20:29.7029524Z  --tty \ 2022-05-18T04:20:29.7029753Z  --detach \ 2022-05-18T04:20:29.7030028Z  --name="${container_name}" \ 2022-05-18T04:20:29.7030315Z  --user jenkins \ 2022-05-18T04:20:29.7030659Z  -v "${GITHUB_WORKSPACE}:/var/lib/jenkins/workspace" \ 2022-05-18T04:20:29.7031039Z  -w /var/lib/jenkins/workspace \ 2022-05-18T04:20:29.7031476Z  "${DOCKER_IMAGE}" 2022-05-18T04:20:29.7031936Z ) 2022-05-18T04:20:29.7032292Z docker exec -t "${container_name}" sh -c "pip install dist/*.whl && ${TEST_COMMAND}" 2022-05-18T04:20:29.7044325Z shell: /usr/bin/bash -e {0} 2022-05-18T04:20:29.7044641Z env: 2022-05-18T04:20:29.7044868Z IN_CI: 1 2022-05-18T04:20:29.7045156Z IS_GHA: 1 2022-05-18T04:20:29.7045472Z GIT_DEFAULT_BRANCH: master 2022-05-18T04:20:29.7045871Z GPU_FLAG: --gpus all 2022-05-18T04:20:29.7046265Z BUILD_ENVIRONMENT: linux-xenial-cuda11.3-py3.7-gcc7 2022-05-18T04:20:29.7046641Z PR_NUMBER: 2022-05-18T04:20:29.7046891Z BRANCH: master 2022-05-18T04:20:29.7047270Z CUSTOM_TEST_ARTIFACT_BUILD_DIR: build/custom_test_artifacts 2022-05-18T04:20:29.7047757Z SHA1: 3b2375291aab7b48442f2e6fb1ef66cebc761e24 2022-05-18T04:20:29.7048122Z PYTORCH_RETRY_TEST_CASES: 1 2022-05-18T04:20:29.7048477Z JOB_BASE_NAME: linux-xenial-cuda11.3-py3.7-gcc7-test 2022-05-18T04:20:29.7048868Z TEST_CONFIG: distributed 2022-05-18T04:20:29.7049216Z SHARD_NUMBER: 2 2022-05-18T04:20:29.7049475Z NUM_TEST_SHARDS: 2 2022-05-18T04:20:29.7049778Z PR_BODY: 2022-05-18T04:20:29.7050133Z SCCACHE_BUCKET: ossci-compiler-cache-circleci-v2 2022-05-18T04:20:29.7050752Z SHM_SIZE: 2g 2022-05-18T04:20:29.7051350Z DOCKER_IMAGE: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7:6deab82db6a72ca54cd3e3322ee4f13864536734 2022-05-18T04:20:29.7051898Z XLA_CUDA: 2022-05-18T04:20:29.7052305Z XLA_CLANG_CACHE_S3_BUCKET_NAME: ossci-compiler-clang-cache-circleci-xla 2022-05-18T04:20:29.7052691Z ##[endgroup] 2022-05-18T04:20:29.7081878Z + [[ distributed == \m\u\l\t\i\g\p\u ]] 2022-05-18T04:20:29.7082413Z + [[ linux-xenial-cuda11.3-py3.7-gcc7 == *onnx* ]] 2022-05-18T04:20:29.7082902Z + TEST_COMMAND=.jenkins/pytorch/test.sh 2022-05-18T04:20:29.7085792Z ++ git cherry -v origin/master 2022-05-18T04:20:29.7118694Z + COMMIT_MESSAGES= 2022-05-18T04:20:29.7119027Z + export COMMIT_MESSAGES 2022-05-18T04:20:29.7128080Z +++ nproc --ignore=2 2022-05-18T04:20:29.7140555Z ++ docker run --gpus all -e BUILD_ENVIRONMENT -e PR_NUMBER -e CUSTOM_TEST_ARTIFACT_BUILD_DIR -e GITHUB_ACTIONS -e IN_CI -e IS_GHA -e BRANCH -e SHA1 -e AWS_DEFAULT_REGION -e IN_WHEEL_TEST -e SHARD_NUMBER -e JOB_BASE_NAME -e TEST_CONFIG -e NUM_TEST_SHARDS -e PR_BODY -e COMMIT_MESSAGES -e PYTORCH_RETRY_TEST_CASES -e PR_LABELS -e MAX_JOBS=30 -e SCCACHE_BUCKET -e XLA_CUDA -e XLA_CLANG_CACHE_S3_BUCKET_NAME --env-file=/tmp/github_env_2342799944 --ulimit stack=10485760:83886080 --security-opt seccomp=unconfined --cap-add=SYS_PTRACE --ipc=host --shm-size=2g --tty --detach --name= --user jenkins -v /home/ec2-user/actions-runner/_work/pytorch/pytorch:/var/lib/jenkins/workspace -w /var/lib/jenkins/workspace 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7:6deab82db6a72ca54cd3e3322ee4f13864536734 2022-05-18T04:20:53.3300177Z + container_name=ee34c49c9c62c22ef7a6ae17e6a604c5c0073de7fa7971bf62d1b5af644989c0 2022-05-18T04:20:53.3301089Z + docker exec -t ee34c49c9c62c22ef7a6ae17e6a604c5c0073de7fa7971bf62d1b5af644989c0 sh -c 'pip install dist/*.whl && .jenkins/pytorch/test.sh' 2022-05-18T04:20:53.8239614Z Processing ./dist/torch-1.12.0a0+git3b23752-cp37-cp37m-linux_x86_64.whl 2022-05-18T04:20:53.9252540Z Requirement already satisfied: typing-extensions in /opt/conda/lib/python3.7/site-packages (from torch==1.12.0a0+git3b23752) (4.1.1) 2022-05-18T04:20:54.4787102Z Installing collected packages: torch 2022-05-18T04:21:06.2685109Z Successfully installed torch-1.12.0a0+git3b23752 2022-05-18T04:21:06.3192341Z + COMPACT_JOB_NAME=linux-xenial-cuda11.3-py3.7-gcc7 2022-05-18T04:21:06.3195503Z ++ python -c 'import site; print(site.getsitepackages()[0])' 2022-05-18T04:21:06.3410704Z + TORCH_INSTALL_DIR=/opt/conda/lib/python3.7/site-packages/torch 2022-05-18T04:21:06.3411196Z + TORCH_BIN_DIR=/opt/conda/lib/python3.7/site-packages/torch/bin 2022-05-18T04:21:06.3411652Z + TORCH_LIB_DIR=/opt/conda/lib/python3.7/site-packages/torch/lib 2022-05-18T04:21:06.3412425Z + TORCH_TEST_DIR=/opt/conda/lib/python3.7/site-packages/torch/test 2022-05-18T04:21:06.3414629Z + BUILD_DIR=build 2022-05-18T04:21:06.3415236Z + BUILD_RENAMED_DIR=build_renamed 2022-05-18T04:21:06.3415539Z + BUILD_BIN_DIR=build/bin 2022-05-18T04:21:06.3415903Z + [[ -n distributed ]] 2022-05-18T04:21:06.3416345Z + BUILD_ENVIRONMENT=linux-xenial-cuda11.3-py3.7-gcc7-distributed 2022-05-18T04:21:06.3417057Z + [[ linux-xenial-cuda11.3-py3.7-gcc7-distributed != *bazel* ]] 2022-05-18T04:21:06.3417467Z ++ realpath build/custom_test_artifacts 2022-05-18T04:21:06.3422604Z + CUSTOM_TEST_ARTIFACT_BUILD_DIR=/var/lib/jenkins/workspace/build/custom_test_artifacts 2022-05-18T04:21:06.3426769Z ++ dirname .jenkins/pytorch/test.sh 2022-05-18T04:21:06.3433361Z + source .jenkins/pytorch/common.sh 2022-05-18T04:21:06.3437800Z +++ dirname .jenkins/pytorch/common.sh 2022-05-18T04:21:06.3447507Z ++ source .jenkins/pytorch/common_utils.sh 2022-05-18T04:21:06.3452841Z +++ TORCHVISION_COMMIT=8a2dc6f22ac4389ccba8859aa1e1cb14f1ee53db 2022-05-18T04:21:06.3453258Z ++ set -ex 2022-05-18T04:21:06.3460161Z ++++ dirname .jenkins/pytorch/common.sh 2022-05-18T04:21:06.3469556Z +++ cd .jenkins/pytorch 2022-05-18T04:21:06.3469857Z +++ pwd -P 2022-05-18T04:21:06.3472880Z ++ SCRIPT_DIR=/var/lib/jenkins/workspace/.jenkins/pytorch 2022-05-18T04:21:06.3473377Z ++ [[ linux-xenial-cuda11.3-py3.7-gcc7-distributed == *linux* ]] 2022-05-18T04:21:06.3477186Z +++ find /etc/apt/ -type f -name '*.list' 2022-05-18T04:21:06.3495236Z ++ sudo sed -i 's/.*nvidia.*/# &/' /etc/apt/sources.list /etc/apt/sources.list.d/cuda.list /etc/apt/sources.list.d/nodesource.list /etc/apt/sources.list.d/ubuntu-toolchain-r-ubuntu-test-xenial.list /etc/apt/sources.list.d/yarn.list 2022-05-18T04:21:06.3555201Z ++ [[ linux-xenial-cuda11.3-py3.7-gcc7-distributed == *rocm* ]] 2022-05-18T04:21:06.3555804Z ++ echo ENTERED_USER_LAND 2022-05-18T04:21:06.3556063Z ENTERED_USER_LAND 2022-05-18T04:21:06.3556316Z ++ export IN_CI=1 2022-05-18T04:21:06.3556554Z ++ IN_CI=1 2022-05-18T04:21:06.3556850Z ++ declare -f -t trap_add 2022-05-18T04:21:06.3557127Z ++ trap_add cleanup EXIT 2022-05-18T04:21:06.3557401Z ++ trap_add_cmd=cleanup 2022-05-18T04:21:06.3557631Z ++ shift 2022-05-18T04:21:06.3557927Z ++ for trap_add_name in '"$@"' 2022-05-18T04:21:06.3564938Z ++++ trap -p EXIT 2022-05-18T04:21:06.3568446Z +++ eval 'extract_trap_cmd ' 2022-05-18T04:21:06.3568822Z ++++ extract_trap_cmd 2022-05-18T04:21:06.3569133Z ++++ printf '%s\n' '' 2022-05-18T04:21:06.3569727Z +++ printf '%s\n' cleanup 2022-05-18T04:21:06.3572357Z ++ trap -- ' 2022-05-18T04:21:06.3572647Z cleanup' EXIT 2022-05-18T04:21:06.3575276Z ++ [[ linux-xenial-cuda11.3-py3.7-gcc7-distributed != *win-* ]] 2022-05-18T04:21:06.3575605Z ++ which sccache 2022-05-18T04:21:06.3586568Z ++ sccache --stop-server 2022-05-18T04:21:06.3610220Z ++ true 2022-05-18T04:21:06.3610880Z ++ rm -f /var/lib/jenkins/sccache_error.log 2022-05-18T04:21:06.3619491Z ++ [[ -n '' ]] 2022-05-18T04:21:06.3619916Z ++ [[ linux-xenial-cuda11.3-py3.7-gcc7-distributed == *rocm* ]] 2022-05-18T04:21:06.3620331Z ++ SCCACHE_ERROR_LOG=/var/lib/jenkins/sccache_error.log 2022-05-18T04:21:06.3620638Z ++ SCCACHE_IDLE_TIMEOUT=1200 2022-05-18T04:21:06.3637335Z ++ RUST_LOG=sccache::server=error 2022-05-18T04:21:06.3637889Z ++ sccache --start-server 2022-05-18T04:21:06.3638423Z sccache: Starting the server... 2022-05-18T04:21:06.3893965Z ++ sccache --zero-stats 2022-05-18T04:21:06.3914658Z Compile requests 0 2022-05-18T04:21:06.3915007Z Compile requests executed 0 2022-05-18T04:21:06.3915285Z Cache hits 0 2022-05-18T04:21:06.3915600Z Cache misses 0 2022-05-18T04:21:06.3915898Z Cache timeouts 0 2022-05-18T04:21:06.3916164Z Cache read errors 0 2022-05-18T04:21:06.3916448Z Forced recaches 0 2022-05-18T04:21:06.3916736Z Cache write errors 0 2022-05-18T04:21:06.3917180Z Compilation failures 0 2022-05-18T04:21:06.3917736Z Cache errors 0 2022-05-18T04:21:06.3918113Z Non-cacheable compilations 0 2022-05-18T04:21:06.3918448Z Non-cacheable calls 0 2022-05-18T04:21:06.3918795Z Non-compilation calls 0 2022-05-18T04:21:06.3919103Z Unsupported compiler calls 0 2022-05-18T04:21:06.3919414Z Average cache write 0.000 s 2022-05-18T04:21:06.3919786Z Average cache read miss 0.000 s 2022-05-18T04:21:06.3920113Z Average cache read hit 0.000 s 2022-05-18T04:21:06.3920432Z Failed distributed compilations 0 2022-05-18T04:21:06.3921151Z Cache location S3, bucket: Bucket(name=ossci-compiler-cache-circleci-v2, base_url=http://ossci-compiler-cache-circleci-v2.s3.amazonaws.com/) 2022-05-18T04:21:06.3921815Z ++ [[ linux-xenial-cuda11.3-py3.7-gcc7-test == *-build ]] 2022-05-18T04:21:06.3922140Z ++ which ccache 2022-05-18T04:21:06.3930466Z ++ '[' -z linux-xenial-cuda11.3-py3.7-gcc7 ']' 2022-05-18T04:21:06.3931407Z ++ [[ linux-xenial-cuda11.3-py3.7-gcc7-distributed == *linux-trusty-py3.6-gcc7* ]] 2022-05-18T04:21:06.3932119Z ++ BUILD_TEST_LIBTORCH=0 2022-05-18T04:21:06.3932566Z ++ [[ distributed == *xla* ]] 2022-05-18T04:21:06.3932999Z ++ [[ linux-xenial-cuda11.3-py3.7-gcc7-distributed == *centos* ]] 2022-05-18T04:21:06.3933534Z ++ [[ linux-xenial-cuda11.3-py3.7-gcc7-distributed == *linux-bionic* ]] 2022-05-18T04:21:06.3934078Z ++ [[ linux-xenial-cuda11.3-py3.7-gcc7-distributed == *linux-focal* ]] 2022-05-18T04:21:06.3934469Z + echo 'Testing pytorch' 2022-05-18T04:21:06.3934739Z Testing pytorch 2022-05-18T04:21:06.3935189Z + export LANG=C.UTF-8 2022-05-18T04:21:06.3935592Z + LANG=C.UTF-8 2022-05-18T04:21:06.3936157Z + PR_NUMBER= 2022-05-18T04:21:06.3936426Z + [[ distributed == \d\e\f\a\u\l\t ]] 2022-05-18T04:21:06.3936747Z + [[ distributed == \d\i\s\t\r\i\b\u\t\e\d ]] 2022-05-18T04:21:06.3937318Z + [[ linux-xenial-cuda11.3-py3.7-gcc7-distributed == *rocm* ]] 2022-05-18T04:21:06.3938163Z + [[ linux-xenial-cuda11.3-py3.7-gcc7-distributed == *-slow-* ]] 2022-05-18T04:21:06.3938671Z + [[ distributed == \s\l\o\w ]] 2022-05-18T04:21:06.3939114Z + [[ linux-xenial-cuda11.3-py3.7-gcc7-distributed == *slow-gradcheck* ]] 2022-05-18T04:21:06.3939836Z + [[ linux-xenial-cuda11.3-py3.7-gcc7-distributed == *cuda* ]] 2022-05-18T04:21:06.3940464Z + export PYTORCH_TESTING_DEVICE_ONLY_FOR=cuda 2022-05-18T04:21:06.3940799Z + PYTORCH_TESTING_DEVICE_ONLY_FOR=cuda 2022-05-18T04:21:06.3941245Z + [[ linux-xenial-cuda11.3-py3.7-gcc7-distributed == *cuda11* ]] 2022-05-18T04:21:06.3941613Z + export BUILD_SPLIT_CUDA=ON 2022-05-18T04:21:06.3941884Z + BUILD_SPLIT_CUDA=ON 2022-05-18T04:21:06.3942297Z + [[ linux-xenial-cuda11.3-py3.7-gcc7-distributed == *crossref* ]] 2022-05-18T04:21:06.3942661Z + [[ -n '' ]] 2022-05-18T04:21:06.3942951Z + export PYTORCH_TEST_SKIP_CUDA_MEM_LEAK_CHECK=0 2022-05-18T04:21:06.3943264Z + PYTORCH_TEST_SKIP_CUDA_MEM_LEAK_CHECK=0 2022-05-18T04:21:06.3943725Z + [[ linux-xenial-cuda11.3-py3.7-gcc7-distributed == *rocm* ]] 2022-05-18T04:21:06.3944241Z + [[ linux-xenial-cuda11.3-py3.7-gcc7-distributed != *ppc64le* ]] 2022-05-18T04:21:06.3944732Z + [[ linux-xenial-cuda11.3-py3.7-gcc7-distributed != *-bazel-* ]] 2022-05-18T04:21:06.3945129Z + pip_install --user ninja 2022-05-18T04:21:06.3945499Z + pip install --progress-bar off --user ninja 2022-05-18T04:21:06.9085469Z Collecting ninja 2022-05-18T04:21:06.9384289Z Downloading ninja-1.10.2.3-py2.py3-none-manylinux_2_5_x86_64.manylinux1_x86_64.whl (108 kB) 2022-05-18T04:21:06.9457222Z [?25l 2022-05-18T04:21:07.4084869Z [?25hInstalling collected packages: ninja 2022-05-18T04:21:07.4198166Z  WARNING: The script ninja is installed in '/var/lib/jenkins/.local/bin' which is not on PATH. 2022-05-18T04:21:07.4198821Z Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location. 2022-05-18T04:21:07.4267530Z Successfully installed ninja-1.10.2.3 2022-05-18T04:21:07.4793543Z + export PATH=/var/lib/jenkins/.local/bin:/opt/cache/bin:/opt/conda/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 2022-05-18T04:21:07.4794809Z + PATH=/var/lib/jenkins/.local/bin:/opt/cache/bin:/opt/conda/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 2022-05-18T04:21:07.4796050Z + [[ linux-xenial-cuda11.3-py3.7-gcc7-distributed == *asan* ]] 2022-05-18T04:21:07.4796710Z + [[ linux-xenial-cuda11.3-py3.7-gcc7-distributed == *-NO_AVX-* ]] 2022-05-18T04:21:07.4797120Z + [[ distributed == \n\o\g\p\u\_\N\O\_\A\V\X ]] 2022-05-18T04:21:07.4797576Z + [[ linux-xenial-cuda11.3-py3.7-gcc7-distributed == *-NO_AVX2-* ]] 2022-05-18T04:21:07.4797973Z + [[ distributed == \n\o\g\p\u\_\N\O\_\A\V\X\2 ]] 2022-05-18T04:21:07.4798450Z + [[ linux-xenial-cuda11.3-py3.7-gcc7-distributed == *-NO_AVX512-* ]] 2022-05-18T04:21:07.4798832Z + [[ distributed == \n\o\g\p\u\_\N\O\_\A\V\X\5\1\2 ]] 2022-05-18T04:21:07.4801414Z + [[ linux-xenial-cuda11.3-py3.7-gcc7-distributed == *tbb* ]] 2022-05-18T04:21:07.4816297Z + [[ linux-xenial-cuda11.3-py3.7-gcc7-distributed == *libtorch* ]] 2022-05-18T04:21:07.4816822Z + [[ linux-xenial-cuda11.3-py3.7-gcc7-distributed == *-bazel-* ]] 2022-05-18T04:21:07.4819693Z + cd test 2022-05-18T04:21:07.4820499Z + python -c 'import torch; print(torch.__config__.show())' 2022-05-18T04:21:11.8372128Z PyTorch built with: 2022-05-18T04:21:11.8372570Z - GCC 7.5 2022-05-18T04:21:11.8372909Z - C++ Version: 201402 2022-05-18T04:21:11.8373439Z - Intel(R) oneAPI Math Kernel Library Version 2022.0-Product Build 20211112 for Intel(R) 64 architecture applications 2022-05-18T04:21:11.8374018Z - Intel(R) MKL-DNN v2.6.0 (Git Hash 52b5f107dd9cf10910aaa19cb47f3abf9b349815) 2022-05-18T04:21:11.8374432Z - OpenMP 201511 (a.k.a. OpenMP 4.5) 2022-05-18T04:21:11.8374797Z - LAPACK is enabled (usually provided by MKL) 2022-05-18T04:21:11.8375134Z - NNPACK is enabled 2022-05-18T04:21:11.8375455Z - CPU capability usage: AVX2 2022-05-18T04:21:11.8375767Z - CUDA Runtime 11.3 2022-05-18T04:21:11.8376159Z - NVCC architecture flags: -gencode;arch=compute_52,code=sm_52 2022-05-18T04:21:11.8376559Z - CuDNN 8.3.2 (built against CUDA 11.5) 2022-05-18T04:21:11.8376866Z - Magma 2.5.2 2022-05-18T04:21:11.8379792Z - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.3, CUDNN_VERSION=8.3.2, CXX_COMPILER=/opt/cache/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Werror -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, FORCE_FALLBACK_CUDA_MPI=1, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.12.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=OFF, USE_MPI=ON, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, 2022-05-18T04:21:11.8382006Z 2022-05-18T04:21:12.4180156Z + cd test 2022-05-18T04:21:12.4180697Z + python -c 'import torch; print(torch.__config__.parallel_info())' 2022-05-18T04:21:13.2409344Z ATen/Parallel: 2022-05-18T04:21:13.2409736Z at::get_num_threads() : 16 2022-05-18T04:21:13.2410026Z at::get_num_interop_threads() : 16 2022-05-18T04:21:13.2410581Z OpenMP 201511 (a.k.a. OpenMP 4.5) 2022-05-18T04:21:13.2410880Z omp_get_max_threads() : 16 2022-05-18T04:21:13.2411518Z Intel(R) oneAPI Math Kernel Library Version 2022.0-Product Build 20211112 for Intel(R) 64 architecture applications 2022-05-18T04:21:13.2412226Z mkl_get_max_threads() : 16 2022-05-18T04:21:13.2412689Z Intel(R) MKL-DNN v2.6.0 (Git Hash 52b5f107dd9cf10910aaa19cb47f3abf9b349815) 2022-05-18T04:21:13.2413067Z std::thread::hardware_concurrency() : 32 2022-05-18T04:21:13.2413349Z Environment variables: 2022-05-18T04:21:13.2413626Z OMP_NUM_THREADS : [not set] 2022-05-18T04:21:13.2413913Z MKL_NUM_THREADS : [not set] 2022-05-18T04:21:13.2414298Z ATen parallel backend: OpenMP 2022-05-18T04:21:13.2414496Z 2022-05-18T04:21:13.3505101Z + [[ linux-xenial-cuda11.3-py3.7-gcc7-distributed == *deploy* ]] 2022-05-18T04:21:13.3505698Z + [[ linux-xenial-cuda11.3-py3.7-gcc7-distributed == *backward* ]] 2022-05-18T04:21:13.3506059Z + [[ distributed == *xla* ]] 2022-05-18T04:21:13.3506498Z + [[ linux-xenial-cuda11.3-py3.7-gcc7-distributed == *jit_legacy-test ]] 2022-05-18T04:21:13.3507015Z + [[ linux-xenial-cuda11.3-py3.7-gcc7-test == *jit_legacy-test ]] 2022-05-18T04:21:13.3507385Z + [[ distributed == \j\i\t\_\l\e\g\a\c\y ]] 2022-05-18T04:21:13.3507847Z + [[ linux-xenial-cuda11.3-py3.7-gcc7-distributed == *libtorch* ]] 2022-05-18T04:21:13.3508375Z + [[ linux-xenial-cuda11.3-py3.7-gcc7-distributed == *distributed* ]] 2022-05-18T04:21:13.3508732Z + test_distributed 2022-05-18T04:21:13.3509058Z + echo 'Testing distributed python tests' 2022-05-18T04:21:13.3509379Z Testing distributed python tests 2022-05-18T04:21:13.3509819Z + python test/run_test.py --distributed-tests --shard 2 2 --verbose 2022-05-18T04:21:19.2437851Z Ignoring disabled issues: [] 2022-05-18T04:21:19.2575515Z test/run_test.py:894: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead. 2022-05-18T04:21:19.2576041Z if torch.version.cuda is not None and LooseVersion(torch.version.cuda) == "11.6": 2022-05-18T04:21:19.2639760Z Found stats for current commit: 3b2375291aab7b48442f2e6fb1ef66cebc761e24 and job: linux-xenial-cuda11.3-py3.7-gcc7. Proceeding with those values. 2022-05-18T04:21:19.2642097Z Selected tests: 2022-05-18T04:21:19.2642462Z distributed/rpc/cuda/test_tensorpipe_agent 2022-05-18T04:21:19.2642783Z distributed/fsdp/test_fsdp_core 2022-05-18T04:21:19.2643115Z distributed/test_c10d_nccl 2022-05-18T04:21:19.2643450Z distributed/fsdp/test_fsdp_mixed_precision 2022-05-18T04:21:19.2643753Z distributed/test_c10d_gloo 2022-05-18T04:21:19.2644083Z distributed/fsdp/test_fsdp_summon_full_params 2022-05-18T04:21:19.2644434Z distributed/fsdp/test_fsdp_state_dict 2022-05-18T04:21:19.2644776Z distributed/_shard/sharded_tensor/test_sharded_tensor 2022-05-18T04:21:19.2645126Z distributed/test_c10d_spawn_gloo 2022-05-18T04:21:19.2645453Z distributed/test_c10d_spawn_nccl 2022-05-18T04:21:19.2645734Z distributed/fsdp/test_wrap 2022-05-18T04:21:19.2646050Z distributed/algorithms/test_join 2022-05-18T04:21:19.2646369Z distributed/fsdp/test_fsdp_comm 2022-05-18T04:21:19.2646681Z distributed/test_c10d_common 2022-05-18T04:21:19.2646970Z distributed/fsdp/test_fsdp_meta 2022-05-18T04:21:19.2647325Z distributed/fsdp/test_fsdp_misc 2022-05-18T04:21:19.2647676Z distributed/_shard/checkpoint/test_checkpoint 2022-05-18T04:21:19.2648006Z distributed/fsdp/test_fsdp_checkpoint 2022-05-18T04:21:19.2648395Z distributed/_shard/checkpoint/test_file_system_checkpoint 2022-05-18T04:21:19.2648750Z distributed/fsdp/test_fsdp_apply 2022-05-18T04:21:19.2649051Z distributed/_shard/test_partial_tensor 2022-05-18T04:21:19.2649412Z distributed/fsdp/test_distributed_checkpoint 2022-05-18T04:21:19.2649785Z distributed/_shard/sharded_tensor/ops/test_binary_cmp 2022-05-18T04:21:19.2650159Z distributed/_shard/sharded_tensor/ops/test_elementwise_ops 2022-05-18T04:21:19.2651073Z distributed/elastic/timer/local_timer_test 2022-05-18T04:21:19.2651414Z distributed/test_data_parallel 2022-05-18T04:21:19.2651731Z distributed/fsdp/test_fsdp_multiple_wrapping 2022-05-18T04:21:19.2652078Z distributed/fsdp/test_fsdp_pure_fp16 2022-05-18T04:21:19.2652429Z distributed/_shard/sharded_tensor/ops/test_softmax 2022-05-18T04:21:19.2652830Z distributed/_shard/sharded_tensor/test_sharded_tensor_reshard 2022-05-18T04:21:19.2653486Z distributed/_shard/sharded_optim/test_sharded_optim 2022-05-18T04:21:19.2653895Z distributed/_shard/sharded_tensor/test_megatron_prototype 2022-05-18T04:21:19.2654245Z distributed/test_launcher 2022-05-18T04:21:19.2654533Z distributed/elastic/utils/util_test 2022-05-18T04:21:19.2654865Z distributed/elastic/metrics/api_test 2022-05-18T04:21:19.2655287Z distributed/fsdp/test_utils 2022-05-18T04:21:19.2655634Z distributed/_shard/sharded_tensor/ops/test_math_ops 2022-05-18T04:21:19.2655993Z distributed/_shard/test_replicated_tensor 2022-05-18T04:21:19.2656331Z distributed/elastic/events/lib_test 2022-05-18T04:21:19.2656654Z distributed/fsdp/test_shard_utils 2022-05-18T04:21:19.2656974Z distributed/pipeline/sync/skip/test_gpipe 2022-05-18T04:21:19.2657320Z distributed/pipeline/sync/skip/test_leak 2022-05-18T04:21:19.2657675Z distributed/pipeline/sync/skip/test_stash_pop 2022-05-18T04:21:19.2658033Z distributed/pipeline/sync/skip/test_verify_skippables 2022-05-18T04:21:19.2658402Z distributed/pipeline/sync/test_bugs 2022-05-18T04:21:19.2658732Z distributed/pipeline/sync/test_copy 2022-05-18T04:21:19.2659053Z distributed/pipeline/sync/test_dependency 2022-05-18T04:21:19.2659402Z distributed/pipeline/sync/test_microbatch 2022-05-18T04:21:19.2659735Z distributed/pipeline/sync/test_pipe 2022-05-18T04:21:19.2660048Z distributed/pipeline/sync/test_stream 2022-05-18T04:21:19.2660391Z distributed/pipeline/sync/test_worker 2022-05-18T04:21:19.2660727Z distributed/rpc/test_tensorpipe_agent 2022-05-18T04:21:19.2763370Z Prioritized test from test file changes. 2022-05-18T04:21:19.2763741Z reordering tests for PR: 2022-05-18T04:21:19.2764032Z prioritized: [] 2022-05-18T04:21:19.2768698Z the rest: ['distributed/rpc/cuda/test_tensorpipe_agent', 'distributed/fsdp/test_fsdp_core', 'distributed/test_c10d_nccl', 'distributed/fsdp/test_fsdp_mixed_precision', 'distributed/test_c10d_gloo', 'distributed/fsdp/test_fsdp_summon_full_params', 'distributed/fsdp/test_fsdp_state_dict', 'distributed/_shard/sharded_tensor/test_sharded_tensor', 'distributed/test_c10d_spawn_gloo', 'distributed/test_c10d_spawn_nccl', 'distributed/fsdp/test_wrap', 'distributed/algorithms/test_join', 'distributed/fsdp/test_fsdp_comm', 'distributed/test_c10d_common', 'distributed/fsdp/test_fsdp_meta', 'distributed/fsdp/test_fsdp_misc', 'distributed/_shard/checkpoint/test_checkpoint', 'distributed/fsdp/test_fsdp_checkpoint', 'distributed/_shard/checkpoint/test_file_system_checkpoint', 'distributed/fsdp/test_fsdp_apply', 'distributed/_shard/test_partial_tensor', 'distributed/fsdp/test_distributed_checkpoint', 'distributed/_shard/sharded_tensor/ops/test_binary_cmp', 'distributed/_shard/sharded_tensor/ops/test_elementwise_ops', 'distributed/elastic/timer/local_timer_test', 'distributed/test_data_parallel', 'distributed/fsdp/test_fsdp_multiple_wrapping', 'distributed/fsdp/test_fsdp_pure_fp16', 'distributed/_shard/sharded_tensor/ops/test_softmax', 'distributed/_shard/sharded_tensor/test_sharded_tensor_reshard', 'distributed/_shard/sharded_optim/test_sharded_optim', 'distributed/_shard/sharded_tensor/test_megatron_prototype', 'distributed/test_launcher', 'distributed/elastic/utils/util_test', 'distributed/elastic/metrics/api_test', 'distributed/fsdp/test_utils', 'distributed/_shard/sharded_tensor/ops/test_math_ops', 'distributed/_shard/test_replicated_tensor', 'distributed/elastic/events/lib_test', 'distributed/fsdp/test_shard_utils', 'distributed/pipeline/sync/skip/test_gpipe', 'distributed/pipeline/sync/skip/test_leak', 'distributed/pipeline/sync/skip/test_stash_pop', 'distributed/pipeline/sync/skip/test_verify_skippables', 'distributed/pipeline/sync/test_bugs', 'distributed/pipeline/sync/test_copy', 'distributed/pipeline/sync/test_dependency', 'distributed/pipeline/sync/test_microbatch', 'distributed/pipeline/sync/test_pipe', 'distributed/pipeline/sync/test_stream', 'distributed/pipeline/sync/test_worker', 'distributed/rpc/test_tensorpipe_agent'] 2022-05-18T04:21:19.2772379Z 2022-05-18T04:21:19.3299701Z Running distributed/rpc/cuda/test_tensorpipe_agent ... [2022-05-18 04:21:19.329571] 2022-05-18T04:21:19.3300466Z Executing ['/opt/conda/bin/python', 'distributed/rpc/cuda/test_tensorpipe_agent.py', '-v', '--subprocess', '--import-slow-tests', '--import-disabled-tests'] ... [2022-05-18 04:21:19.329638] 2022-05-18T04:21:20.2285192Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp_ojtmeb9 2022-05-18T04:21:20.2286792Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp_ojtmeb9/_remote_module_non_scriptable.py 2022-05-18T04:21:20.5949534Z ]> 2022-05-18T04:21:20.5950190Z test_ddp_dist_autograd_local_vs_remote_gpu (__main__.TensorPipeCudaDdpComparisonTest) 2022-05-18T04:21:20.5951002Z , <__main__.TensorPipeCudaDistAutogradTest testMethod=test_gpu_to_cpu_continuation>, <__main__.TensorPipeCudaDistAutogradTest testMethod=test_gpu_to_cpu_continuation_gpu_root>]> 2022-05-18T04:21:20.5951757Z test_gpu_simple (__main__.TensorPipeCudaDistAutogradTest) 2022-05-18T04:21:20.5952199Z test_gpu_to_cpu_continuation (__main__.TensorPipeCudaDistAutogradTest) 2022-05-18T04:21:20.5952651Z test_gpu_to_cpu_continuation_gpu_root (__main__.TensorPipeCudaDistAutogradTest) 2022-05-18T04:21:20.5953568Z , <__main__.TensorPipeCudaRemoteModuleTest testMethod=test_input_moved_to_cuda_device_script>, <__main__.TensorPipeCudaRemoteModuleTest testMethod=test_invalid_devices>, <__main__.TensorPipeCudaRemoteModuleTest testMethod=test_valid_device>]> 2022-05-18T04:21:20.5954441Z test_input_moved_to_cuda_device (__main__.TensorPipeCudaRemoteModuleTest) 2022-05-18T04:21:20.5954916Z test_input_moved_to_cuda_device_script (__main__.TensorPipeCudaRemoteModuleTest) 2022-05-18T04:21:20.5955358Z test_invalid_devices (__main__.TensorPipeCudaRemoteModuleTest) 2022-05-18T04:21:20.5955783Z test_valid_device (__main__.TensorPipeCudaRemoteModuleTest) 2022-05-18T04:21:20.5956273Z ]> 2022-05-18T04:21:20.5956741Z test_profiler_remote_cuda (__main__.TensorPipeCudaRpcTest) 2022-05-18T04:21:20.5958006Z , <__main__.TensorPipePipeWithDDPTest testMethod=test_basic_gloo_ckpt_except_last>, <__main__.TensorPipePipeWithDDPTest testMethod=test_basic_gloo_ckpt_never>, <__main__.TensorPipePipeWithDDPTest testMethod=test_basic_gloo_ckpt_never_find_unused>, <__main__.TensorPipePipeWithDDPTest testMethod=test_basic_nccl_ckpt_always>, <__main__.TensorPipePipeWithDDPTest testMethod=test_basic_nccl_ckpt_except_last>, <__main__.TensorPipePipeWithDDPTest testMethod=test_basic_nccl_ckpt_never>, <__main__.TensorPipePipeWithDDPTest testMethod=test_basic_nccl_ckpt_never_find_unused>]> 2022-05-18T04:21:20.5959377Z test_basic_gloo_ckpt_always (__main__.TensorPipePipeWithDDPTest) 2022-05-18T04:21:20.5959818Z test_basic_gloo_ckpt_except_last (__main__.TensorPipePipeWithDDPTest) 2022-05-18T04:21:20.5960229Z test_basic_gloo_ckpt_never (__main__.TensorPipePipeWithDDPTest) 2022-05-18T04:21:20.5960873Z test_basic_gloo_ckpt_never_find_unused (__main__.TensorPipePipeWithDDPTest) 2022-05-18T04:21:20.5961665Z test_basic_nccl_ckpt_always (__main__.TensorPipePipeWithDDPTest) 2022-05-18T04:21:20.5962414Z test_basic_nccl_ckpt_except_last (__main__.TensorPipePipeWithDDPTest) 2022-05-18T04:21:20.5963222Z test_basic_nccl_ckpt_never (__main__.TensorPipePipeWithDDPTest) 2022-05-18T04:21:20.5964009Z test_basic_nccl_ckpt_never_find_unused (__main__.TensorPipePipeWithDDPTest) 2022-05-18T04:21:20.5980582Z , <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_async_execution_with_cuda_future>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_cuda_future_callback_changes_devices>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_cuda_future_can_extract_cuda_sparse_tensor>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_cuda_future_can_extract_cuda_tensor>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_cuda_future_can_extract_custom_class_with_cuda_sparse_tensor>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_cuda_future_can_extract_custom_class_with_cuda_tensor>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_cuda_future_can_extract_list_with_cuda_sparse_tensor>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_cuda_future_can_extract_list_with_cuda_tensor>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_cuda_future_device_as_device>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_cuda_future_device_as_int>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_cuda_future_device_as_str>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_cuda_future_device_not_cuda>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_cuda_future_modify_tensor_inplace>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_cuda_future_replace_tensor>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_cuda_future_value_on_bad_device>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_custom_stream>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_custom_stream_multi>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_custom_stream_nested>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_custom_stream_nested_multi>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_device_map_cpu>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_device_map_cpu_to_gpu_default>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_device_map_cpu_to_gpu_non_default>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_device_map_gpu_default>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_device_map_gpu_default_to_non_default>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_device_map_gpu_mixed_1>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_device_map_gpu_mixed_2>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_device_map_gpu_mixed_3>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_device_map_gpu_mixed_4>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_device_map_gpu_mixed_5>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_device_map_gpu_mixed_6>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_device_map_gpu_mixed_7>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_device_map_gpu_mixed_8>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_device_map_gpu_mixed_self_1>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_device_map_gpu_mixed_self_2>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_device_map_gpu_mixed_self_3>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_device_map_gpu_mixed_self_4>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_device_map_gpu_mixed_self_5>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_device_map_gpu_mixed_self_6>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_device_map_gpu_mixed_self_7>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_device_map_gpu_mixed_self_8>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_device_map_gpu_non_default>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_device_map_gpu_non_default_to_default>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_device_map_gpu_to_cpu_default>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_device_map_gpu_to_cpu_non_default>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_device_maps_gpu>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_device_maps_in_options>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_device_maps_invalid_max_local_device>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_device_maps_invalid_max_remote_device>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_device_maps_invalid_min_device>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_device_maps_many_to_one>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_device_maps_missing_config>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_device_maps_missing_config_loop>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_device_maps_missing_config_not_timeout>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_device_maps_missing_config_remote>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_device_maps_missing_config_remote_response>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_device_maps_missing_config_response>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_device_maps_missing_config_response_loop>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_device_maps_multi_gpu>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_device_maps_multi_gpu_self>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_device_maps_one_to_many>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_device_maps_remote>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_device_maps_return_to_gpu>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_device_maps_return_to_gpu_self>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_device_maps_wrong_worker_name>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_device_mismatch>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_devices_option_mismatch>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_devices_option_mismatch_reverse>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_meta_multiple_tensors>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_owner_rref_forward_synchronization1>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_owner_rref_forward_synchronization2>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_owner_rref_forward_synchronization3>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_owner_rref_forward_synchronization4>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_rref_as_arg_synchronization1>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_rref_as_arg_synchronization2>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_rref_as_arg_synchronization3>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_rref_as_arg_synchronization4>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_rref_as_arg_synchronization5>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_rref_forward_synchronization1>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_rref_forward_synchronization2>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_rref_forward_synchronization3>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_rref_forward_synchronization4>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_rref_to_here_synchronization1>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_rref_to_here_synchronization2>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_rref_to_here_synchronization3>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_rref_to_here_synchronization4>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_rref_with_unpickleable_attributes>, <__main__.TensorPipeTensorPipeAgentCudaRpcTest testMethod=test_tensor_view_as_return_value>]> 2022-05-18T04:21:20.5994691Z test_async_execution_nested_with_cuda_future (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:21:20.5995227Z test_async_execution_with_cuda_future (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:21:20.5995816Z test_cuda_future_callback_changes_devices (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:21:20.5996365Z test_cuda_future_can_extract_cuda_sparse_tensor (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:21:20.5996880Z test_cuda_future_can_extract_cuda_tensor (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:21:20.5997430Z test_cuda_future_can_extract_custom_class_with_cuda_sparse_tensor (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:21:20.5998002Z test_cuda_future_can_extract_custom_class_with_cuda_tensor (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:21:20.5998566Z test_cuda_future_can_extract_list_with_cuda_sparse_tensor (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:21:20.5999107Z test_cuda_future_can_extract_list_with_cuda_tensor (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:21:20.5999626Z test_cuda_future_device_as_device (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:21:20.6000121Z test_cuda_future_device_as_int (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:21:20.6000596Z test_cuda_future_device_as_str (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:21:20.6001095Z test_cuda_future_device_not_cuda (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:21:20.6001604Z test_cuda_future_modify_tensor_inplace (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:21:20.6002109Z test_cuda_future_replace_tensor (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:21:20.6002593Z test_cuda_future_value_on_bad_device (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:21:20.6003075Z test_custom_stream (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:21:20.6003537Z test_custom_stream_multi (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:21:20.6003995Z test_custom_stream_nested (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:21:20.6004480Z test_custom_stream_nested_multi (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:21:20.6004963Z test_device_map_cpu (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:21:20.6005451Z test_device_map_cpu_to_gpu_default (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:21:20.6005944Z test_device_map_cpu_to_gpu_non_default (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:21:20.6006443Z test_device_map_gpu_default (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:21:20.6006950Z test_device_map_gpu_default_to_non_default (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:21:20.6007435Z test_device_map_gpu_mixed_1 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:21:20.6007919Z test_device_map_gpu_mixed_2 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:21:20.6008390Z test_device_map_gpu_mixed_3 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:21:20.6008858Z test_device_map_gpu_mixed_4 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:21:20.6009311Z test_device_map_gpu_mixed_5 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:21:20.6009788Z test_device_map_gpu_mixed_6 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:21:20.6010535Z test_device_map_gpu_mixed_7 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:21:20.6011163Z test_device_map_gpu_mixed_8 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:21:20.6011654Z test_device_map_gpu_mixed_self_1 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:21:20.6012145Z test_device_map_gpu_mixed_self_2 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:21:20.6012637Z test_device_map_gpu_mixed_self_3 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:21:20.6013216Z test_device_map_gpu_mixed_self_4 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:21:20.6013693Z test_device_map_gpu_mixed_self_5 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:21:20.6014177Z test_device_map_gpu_mixed_self_6 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:21:20.6014706Z test_device_map_gpu_mixed_self_7 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:21:20.6015200Z test_device_map_gpu_mixed_self_8 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:21:20.6015690Z test_device_map_gpu_non_default (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:21:20.6016203Z test_device_map_gpu_non_default_to_default (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:21:20.6016702Z test_device_map_gpu_to_cpu_default (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:21:20.6017214Z test_device_map_gpu_to_cpu_non_default (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:21:20.6017711Z test_device_maps_gpu (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:21:20.6018173Z test_device_maps_in_options (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:21:20.6018678Z test_device_maps_invalid_max_local_device (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:21:20.6019237Z test_device_maps_invalid_max_remote_device (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:21:20.6019767Z test_device_maps_invalid_min_device (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:21:20.6020246Z test_device_maps_many_to_one (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:21:20.6020738Z test_device_maps_missing_config (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:21:20.6021241Z test_device_maps_missing_config_loop (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:21:20.6021756Z test_device_maps_missing_config_not_timeout (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:21:20.6022264Z test_device_maps_missing_config_remote (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:21:20.6022791Z test_device_maps_missing_config_remote_response (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:21:20.6023317Z test_device_maps_missing_config_response (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:21:20.6023832Z test_device_maps_missing_config_response_loop (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:21:20.6024342Z test_device_maps_multi_gpu (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:21:20.6024831Z test_device_maps_multi_gpu_self (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:21:20.6025315Z test_device_maps_one_to_many (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:21:20.6025777Z test_device_maps_remote (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:21:20.6026257Z test_device_maps_return_to_gpu (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:21:20.6026759Z test_device_maps_return_to_gpu_self (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:21:20.6027245Z test_device_maps_wrong_worker_name (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:21:20.6027731Z test_device_mismatch (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:21:20.6028208Z test_devices_option_mismatch (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:21:20.6028712Z test_devices_option_mismatch_reverse (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:21:20.6029189Z test_meta_multiple_tensors (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:21:20.6029696Z test_owner_rref_forward_synchronization1 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:21:20.6030224Z test_owner_rref_forward_synchronization2 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:21:20.6030726Z test_owner_rref_forward_synchronization3 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:21:20.6031324Z test_owner_rref_forward_synchronization4 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:21:20.6031836Z test_rref_as_arg_synchronization1 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:21:20.6032337Z test_rref_as_arg_synchronization2 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:21:20.6032817Z test_rref_as_arg_synchronization3 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:21:20.6033362Z test_rref_as_arg_synchronization4 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:21:20.6033865Z test_rref_as_arg_synchronization5 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:21:20.6034349Z test_rref_forward_synchronization1 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:21:20.6034862Z test_rref_forward_synchronization2 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:21:20.6035371Z test_rref_forward_synchronization3 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:21:20.6035879Z test_rref_forward_synchronization4 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:21:20.6036370Z test_rref_to_here_synchronization1 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:21:20.6036872Z test_rref_to_here_synchronization2 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:21:20.6037374Z test_rref_to_here_synchronization3 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:21:20.6037859Z test_rref_to_here_synchronization4 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:21:20.6038373Z test_rref_with_unpickleable_attributes (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:21:20.6038881Z test_tensor_view_as_return_value (__main__.TensorPipeTensorPipeAgentCudaRpcTest) 2022-05-18T04:21:20.6039783Z , <__main__.TensorPipeTensorPipeCudaDistAutogradTest testMethod=test_dist_autograd_sync_streams>, <__main__.TensorPipeTensorPipeCudaDistAutogradTest testMethod=test_gradients_synchronizations>]> 2022-05-18T04:21:20.6040672Z test_device_maps_backward_pass (__main__.TensorPipeTensorPipeCudaDistAutogradTest) 2022-05-18T04:21:20.6041197Z test_dist_autograd_sync_streams (__main__.TensorPipeTensorPipeCudaDistAutogradTest) 2022-05-18T04:21:20.6041734Z test_gradients_synchronizations (__main__.TensorPipeTensorPipeCudaDistAutogradTest) 2022-05-18T04:21:21.4839677Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpebnf6n6k 2022-05-18T04:21:21.4841626Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpebnf6n6k/_remote_module_non_scriptable.py 2022-05-18T04:21:21.8604455Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:21:21.8621026Z 2022-05-18T04:21:21.8621292Z Running tests... 2022-05-18T04:21:21.8621714Z ---------------------------------------------------------------------- 2022-05-18T04:21:23.5187796Z test_ddp_dist_autograd_local_vs_remote_gpu (__main__.TensorPipeCudaDdpComparisonTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:21:23.5806002Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 314 2022-05-18T04:21:23.5905877Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 315 2022-05-18T04:21:23.6005304Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 316 2022-05-18T04:21:23.6103821Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 317 2022-05-18T04:21:24.5061892Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpe35fa7dj 2022-05-18T04:21:24.5063078Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpe35fa7dj/_remote_module_non_scriptable.py 2022-05-18T04:21:24.5153966Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpg_wdft8y 2022-05-18T04:21:24.5156033Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpg_wdft8y/_remote_module_non_scriptable.py 2022-05-18T04:21:24.5486369Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpww717wvl 2022-05-18T04:21:24.5488467Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpww717wvl/_remote_module_non_scriptable.py 2022-05-18T04:21:24.5615474Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpatlocrxm 2022-05-18T04:21:24.5618468Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpatlocrxm/_remote_module_non_scriptable.py 2022-05-18T04:21:24.8606839Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:21:24.8746748Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:21:24.9077543Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:21:24.9299148Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:21:25.1158330Z skip: Need at least 4 CUDA devices (3.253s) 2022-05-18T04:21:25.1158537Z 2022-05-18T04:21:25.1158934Z ---------------------------------------------------------------------- 2022-05-18T04:21:25.1159288Z Ran 1 test in 3.254s 2022-05-18T04:21:25.1159463Z 2022-05-18T04:21:25.1159578Z OK (skipped=1) 2022-05-18T04:21:25.1159740Z 2022-05-18T04:21:25.1159849Z Generating XML reports... 2022-05-18T04:21:25.1204080Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeCudaDdpComparisonTest-20220518042121.xml 2022-05-18T04:21:26.2706434Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp4eyxvg95 2022-05-18T04:21:26.2707457Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp4eyxvg95/_remote_module_non_scriptable.py 2022-05-18T04:21:26.6262783Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:21:26.6277107Z 2022-05-18T04:21:26.6277522Z Running tests... 2022-05-18T04:21:26.6278171Z ---------------------------------------------------------------------- 2022-05-18T04:21:28.2389092Z test_gpu_simple (__main__.TensorPipeCudaDistAutogradTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:21:28.3016213Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 495 2022-05-18T04:21:28.3114251Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 496 2022-05-18T04:21:28.3214724Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 497 2022-05-18T04:21:28.3313788Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 498 2022-05-18T04:21:29.1943651Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpwxphtlsu 2022-05-18T04:21:29.1944880Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpwxphtlsu/_remote_module_non_scriptable.py 2022-05-18T04:21:29.2008071Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpwogegbwn 2022-05-18T04:21:29.2010880Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpwogegbwn/_remote_module_non_scriptable.py 2022-05-18T04:21:29.2067451Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp99m_4ssn 2022-05-18T04:21:29.2070259Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp99m_4ssn/_remote_module_non_scriptable.py 2022-05-18T04:21:29.2082013Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpgszsv1wh 2022-05-18T04:21:29.2084739Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpgszsv1wh/_remote_module_non_scriptable.py 2022-05-18T04:21:29.5502531Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:21:29.5592331Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:21:29.5639632Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:21:29.5798833Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:21:31.8415261Z ok (5.213s) 2022-05-18T04:21:31.8415692Z 2022-05-18T04:21:31.8416474Z ---------------------------------------------------------------------- 2022-05-18T04:21:31.8416998Z Ran 1 test in 5.214s 2022-05-18T04:21:31.8417167Z 2022-05-18T04:21:31.8417264Z OK 2022-05-18T04:21:31.8417382Z 2022-05-18T04:21:31.8417818Z Generating XML reports... 2022-05-18T04:21:31.8460002Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeCudaDistAutogradTest-20220518042126.xml 2022-05-18T04:21:32.9819285Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpu9x27tu6 2022-05-18T04:21:32.9820544Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpu9x27tu6/_remote_module_non_scriptable.py 2022-05-18T04:21:33.3538183Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:21:33.3552750Z 2022-05-18T04:21:33.3553012Z Running tests... 2022-05-18T04:21:33.3553441Z ---------------------------------------------------------------------- 2022-05-18T04:21:35.0181458Z test_gpu_to_cpu_continuation (__main__.TensorPipeCudaDistAutogradTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:21:35.0845095Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 924 2022-05-18T04:21:35.0947711Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 925 2022-05-18T04:21:35.1051042Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 926 2022-05-18T04:21:35.1154989Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 927 2022-05-18T04:21:35.9878099Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpdnt43_m0 2022-05-18T04:21:35.9879281Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpdnt43_m0/_remote_module_non_scriptable.py 2022-05-18T04:21:36.0354280Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpjfq6vy2g 2022-05-18T04:21:36.0356069Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpjfq6vy2g/_remote_module_non_scriptable.py 2022-05-18T04:21:36.0398419Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpyhttk6ks 2022-05-18T04:21:36.0400490Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpyhttk6ks/_remote_module_non_scriptable.py 2022-05-18T04:21:36.0628279Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpxwlrbker 2022-05-18T04:21:36.0630285Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpxwlrbker/_remote_module_non_scriptable.py 2022-05-18T04:21:36.3578139Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:21:36.3861151Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:21:36.4009250Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:21:36.4197079Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:21:38.7262546Z ok (5.371s) 2022-05-18T04:21:38.7262787Z 2022-05-18T04:21:38.7263189Z ---------------------------------------------------------------------- 2022-05-18T04:21:38.7263549Z Ran 1 test in 5.371s 2022-05-18T04:21:38.7263724Z 2022-05-18T04:21:38.7263845Z OK 2022-05-18T04:21:38.7263995Z 2022-05-18T04:21:38.7264138Z Generating XML reports... 2022-05-18T04:21:38.7309020Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeCudaDistAutogradTest-20220518042133.xml 2022-05-18T04:21:39.8588746Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpddge_ok0 2022-05-18T04:21:39.8589726Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpddge_ok0/_remote_module_non_scriptable.py 2022-05-18T04:21:40.2312621Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:21:40.2328079Z 2022-05-18T04:21:40.2328267Z Running tests... 2022-05-18T04:21:40.2328708Z ---------------------------------------------------------------------- 2022-05-18T04:21:41.8729906Z test_gpu_to_cpu_continuation_gpu_root (__main__.TensorPipeCudaDistAutogradTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:21:41.9350777Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 1353 2022-05-18T04:21:41.9449771Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 1354 2022-05-18T04:21:41.9551988Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 1355 2022-05-18T04:21:41.9654800Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 1356 2022-05-18T04:21:42.8279240Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpu0mnhq35 2022-05-18T04:21:42.8280159Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpu0mnhq35/_remote_module_non_scriptable.py 2022-05-18T04:21:42.8509679Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpe8lge5_h 2022-05-18T04:21:42.8512068Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpe8lge5_h/_remote_module_non_scriptable.py 2022-05-18T04:21:42.8530700Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpfclcxfva 2022-05-18T04:21:42.8533498Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpfclcxfva/_remote_module_non_scriptable.py 2022-05-18T04:21:42.8564768Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpdxfj8ls3 2022-05-18T04:21:42.8567573Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpdxfj8ls3/_remote_module_non_scriptable.py 2022-05-18T04:21:43.1841431Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:21:43.2071460Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:21:43.2071959Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:21:43.2221046Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:21:45.5761436Z ok (5.343s) 2022-05-18T04:21:45.5761757Z 2022-05-18T04:21:45.5762206Z ---------------------------------------------------------------------- 2022-05-18T04:21:45.5762557Z Ran 1 test in 5.343s 2022-05-18T04:21:45.5762729Z 2022-05-18T04:21:45.5762807Z OK 2022-05-18T04:21:45.5762947Z 2022-05-18T04:21:45.5763087Z Generating XML reports... 2022-05-18T04:21:45.5806939Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeCudaDistAutogradTest-20220518042140.xml 2022-05-18T04:21:46.7496382Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpitjm64xw 2022-05-18T04:21:46.7498069Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpitjm64xw/_remote_module_non_scriptable.py 2022-05-18T04:21:47.1233526Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:21:47.1248712Z 2022-05-18T04:21:47.1248853Z Running tests... 2022-05-18T04:21:47.1249581Z ---------------------------------------------------------------------- 2022-05-18T04:21:48.7753695Z test_input_moved_to_cuda_device (__main__.TensorPipeCudaRemoteModuleTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:21:48.8390797Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 1782 2022-05-18T04:21:48.8492715Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 1783 2022-05-18T04:21:49.7165394Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpqsgpmriz 2022-05-18T04:21:49.7166617Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpqsgpmriz/_remote_module_non_scriptable.py 2022-05-18T04:21:49.7346193Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpydca6jj2 2022-05-18T04:21:49.7349075Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpydca6jj2/_remote_module_non_scriptable.py 2022-05-18T04:21:50.0702540Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:21:50.0993848Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:21:51.8578514Z ok (4.733s) 2022-05-18T04:21:51.8578952Z 2022-05-18T04:21:51.8579729Z ---------------------------------------------------------------------- 2022-05-18T04:21:51.8580161Z Ran 1 test in 4.733s 2022-05-18T04:21:51.8580334Z 2022-05-18T04:21:51.8580440Z OK 2022-05-18T04:21:51.8580583Z 2022-05-18T04:21:51.8580724Z Generating XML reports... 2022-05-18T04:21:51.8625890Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeCudaRemoteModuleTest-20220518042147.xml 2022-05-18T04:21:53.0167452Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpbdzhh884 2022-05-18T04:21:53.0168371Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpbdzhh884/_remote_module_non_scriptable.py 2022-05-18T04:21:53.3744593Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:21:53.3759217Z 2022-05-18T04:21:53.3759640Z Running tests... 2022-05-18T04:21:53.3760139Z ---------------------------------------------------------------------- 2022-05-18T04:21:54.9697988Z test_input_moved_to_cuda_device_script (__main__.TensorPipeCudaRemoteModuleTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:21:55.0338136Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 1976 2022-05-18T04:21:55.0442743Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 1977 2022-05-18T04:21:55.9158194Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpopu7z_ms 2022-05-18T04:21:55.9159152Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpopu7z_ms/_remote_module_non_scriptable.py 2022-05-18T04:21:55.9313137Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpgg3v6lxa 2022-05-18T04:21:55.9316059Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpgg3v6lxa/_remote_module_non_scriptable.py 2022-05-18T04:21:56.2685083Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:21:56.2985870Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:21:56.4803000Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpgg3v6lxa/_remote_module___torch___torch_testing__internal_distributed_nn_api_remote_module_test_MyModuleInterface.py 2022-05-18T04:21:56.4804803Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpopu7z_ms/_remote_module___torch___torch_testing__internal_distributed_nn_api_remote_module_test_MyModuleInterface.py 2022-05-18T04:21:56.4888467Z INFO:torch.distributed.nn.jit.instantiator:Skipped writing /tmp/tmpopu7z_ms/_remote_module___torch___torch_testing__internal_distributed_nn_api_remote_module_test_MyModuleInterface.py 2022-05-18T04:21:58.1526270Z ok (4.776s) 2022-05-18T04:21:58.1526491Z 2022-05-18T04:21:58.1526905Z ---------------------------------------------------------------------- 2022-05-18T04:21:58.1527266Z Ran 1 test in 4.777s 2022-05-18T04:21:58.1527455Z 2022-05-18T04:21:58.1527553Z OK 2022-05-18T04:21:58.1527673Z 2022-05-18T04:21:58.1527810Z Generating XML reports... 2022-05-18T04:21:58.1570377Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeCudaRemoteModuleTest-20220518042153.xml 2022-05-18T04:21:59.2715431Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmphe0ho6pi 2022-05-18T04:21:59.2716882Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmphe0ho6pi/_remote_module_non_scriptable.py 2022-05-18T04:21:59.6439450Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:21:59.6455259Z 2022-05-18T04:21:59.6455752Z Running tests... 2022-05-18T04:21:59.6456262Z ---------------------------------------------------------------------- 2022-05-18T04:22:01.2974464Z test_invalid_devices (__main__.TensorPipeCudaRemoteModuleTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:22:01.3618255Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 2186 2022-05-18T04:22:01.3718487Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 2187 2022-05-18T04:22:02.2403504Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpew7eccsf 2022-05-18T04:22:02.2404328Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpew7eccsf/_remote_module_non_scriptable.py 2022-05-18T04:22:02.2517478Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpnyxjwgma 2022-05-18T04:22:02.2520409Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpnyxjwgma/_remote_module_non_scriptable.py 2022-05-18T04:22:02.5944894Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:22:02.6211409Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:22:02.8009508Z On WorkerInfo(id=1, name=worker1): 2022-05-18T04:22:02.8031981Z RuntimeError('CUDA error: invalid device ordinal\nCUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.\nFor debugging consider passing CUDA_LAUNCH_BLOCKING=1.\nException raised from exchangeDevice at /var/lib/jenkins/workspace/c10/cuda/impl/CUDAGuardImpl.h:33 (most recent call first):\nframe #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) + 0x6b (0x7fe2f99c11eb in /opt/conda/lib/python3.7/site-packages/torch/lib/libc10.so)\nframe #1: + 0x14814 (0x7fe30285e814 in /opt/conda/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)\nframe #2: + 0xf044f6 (0x7fe2fab044f6 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cuda_cu.so)\nframe #3: + 0x29fed64 (0x7fe2fc5fed64 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cuda_cu.so)\nframe #4: + 0x29fee4b (0x7fe2fc5fee4b in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cuda_cu.so)\nframe #5: at::_ops::empty_strided::redispatch(c10::DispatchKeySet, c10::ArrayRef, c10::ArrayRef, c10::optional, c10::optional, c10::optional, c10::optional) + 0x10f (0x7fe3042a1c6f in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #6: + 0x1a65005 (0x7fe304503005 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #7: at::_ops::empty_strided::call(c10::ArrayRef, c10::ArrayRef, c10::optional, c10::optional, c10::optional, c10::optional) + 0x174 (0x7fe3042e0a64 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #8: at::native::_to_copy(at::Tensor const&, c10::optional, c10::optional, c10::optional, c10::optional, bool, c10::optional) + 0x12da (0x7fe303d221ca in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #9: + 0x1bff1b3 (0x7fe30469d1b3 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #10: at::_ops::_to_copy::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::optional, c10::optional, c10::optional, c10::optional, bool, c10::optional) + 0x10d (0x7fe304062b0d in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #11: + 0x1a67821 (0x7fe304505821 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #12: at::_ops::_to_copy::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::optional, c10::optional, c10::optional, c10::optional, bool, c10::optional) + 0x10d (0x7fe304062b0d in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #13: + 0x2a27dde (0x7fe3054c5dde in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #14: + 0x2a2835b (0x7fe3054c635b in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #15: at::_ops::_to_copy::call(at::Tensor const&, c10::optional, c10::optional, c10::optional, c10::optional, bool, c10::optional) + 0x202 (0x7fe3040d7aa2 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #16: at::native::to(at::Tensor const&, c10::optional, c10::optional, c10::optional, c10::optional, bool, bool, c10::optional) + 0x13e (0x7fe303d193be in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #17: + 0x1cf01e9 (0x7fe30478e1e9 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #18: at::_ops::to_dtype_layout::call(at::Tensor const&, c10::optional, c10::optional, c10::optional, c10::optional, bool, bool, c10::optional) + 0x216 (0x7fe3041ecdc6 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #19: + 0x31e570 (0x7fe3103c7570 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)\nframe #20: + 0x31ea25 (0x7fe3103c7a25 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)\nframe #21: _PyMethodDescr_FastCallKeywords + 0x330 (0x562a16617fa0 in /opt/conda/bin/python)\nframe #22: + 0x17faae (0x562a16618aae in /opt/conda/bin/python)\nframe #23: _PyEval_EvalFrameDefault + 0x661 (0x562a1665c601 in /opt/conda/bin/python)\nframe #24: _PyEval_EvalCodeWithName + 0xdf9 (0x562a165b2a29 in /opt/conda/bin/python)\nframe #25: _PyFunction_FastCallKeywords + 0x583 (0x562a165d1cd3 in /opt/conda/bin/python)\nframe #26: _PyEval_EvalFrameDefault + 0x3f5 (0x562a1665c395 in /opt/conda/bin/python)\nframe #27: _PyFunction_FastCallKeywords + 0x187 (0x562a165d18d7 in /opt/conda/bin/python)\nframe #28: + 0x17f9c5 (0x562a166189c5 in /opt/conda/bin/python)\nframe #29: _PyEval_EvalFrameDefault + 0x4762 (0x562a16660702 in /opt/conda/bin/python)\nframe #30: _PyEval_EvalCodeWithName + 0xdf9 (0x562a165b2a29 in /opt/conda/bin/python)\nframe #31: _PyFunction_FastCallKeywords + 0x583 (0x562a165d1cd3 in /opt/conda/bin/python)\nframe #32: + 0x17f9c5 (0x562a166189c5 in /opt/conda/bin/python)\nframe #33: _PyEval_EvalFrameDefault + 0x4762 (0x562a16660702 in /opt/conda/bin/python)\nframe #34: _PyFunction_FastCallDict + 0x118 (0x562a165d0cf8 in /opt/conda/bin/python)\nframe #35: _PyEval_EvalFrameDefault + 0x1cb8 (0x562a1665dc58 in /opt/conda/bin/python)\nframe #36: _PyFunction_FastCallDict + 0x118 (0x562a165d0cf8 in /opt/conda/bin/python)\nframe #37: + 0x9839ef (0x7fe310a2c9ef in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)\nframe #38: torch::distributed::rpc::PythonRpcHandler::runPythonUdf(pybind11::object const&) + 0x7d (0x7fe310a2b39d in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)\nframe #39: torch::distributed::rpc::RequestCallbackImpl::runPythonFunction(pybind11::object const&, std::vector >, bool) const + 0x83 (0x7fe310a2dd83 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)\nframe #40: torch::distributed::rpc::RequestCallbackImpl::processPythonRemoteCall(torch::distributed::rpc::RpcCommandBase&, std::vector >) const + 0x83 (0x7fe310a2e3d3 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)\nframe #41: torch::distributed::rpc::RequestCallbackNoPython::processRpc(torch::distributed::rpc::RpcCommandBase&, torch::distributed::rpc::MessageType const&, std::vector >) const + 0x194 (0x7fe30675ab24 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #42: torch::distributed::rpc::RequestCallbackImpl::processRpcWithErrors(torch::distributed::rpc::RpcCommandBase&, torch::distributed::rpc::MessageType const&, std::vector >) const + 0x65 (0x7fe310a2db75 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)\nframe #43: + 0x3cb5e23 (0x7fe306753e23 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #44: torch::distributed::rpc::RequestCallbackNoPython::processMessage(torch::distributed::rpc::Message&, std::vector >) const + 0x538 (0x7fe306754a18 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #45: torch::distributed::rpc::RequestCallback::operator()(torch::distributed::rpc::Message&, std::vector >) const + 0x57 (0x7fe30674f097 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #46: + 0x3ce5b22 (0x7fe306783b22 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #47: c10::ThreadPool::main_loop(unsigned long) + 0x2db (0x7fe2f99ad4bb in /opt/conda/lib/python3.7/site-packages/torch/lib/libc10.so)\nframe #48: + 0xc9039 (0x7fe31dbb4039 in /opt/conda/lib/libstdc++.so.6)\nframe #49: + 0x76ba (0x7fe33e4e36ba in /lib/x86_64-linux-gnu/libpthread.so.0)\nframe #50: clone + 0x6d (0x7fe33e21951d in /lib/x86_64-linux-gnu/libc.so.6)\n') 2022-05-18T04:22:02.8044531Z Traceback (most recent call last): 2022-05-18T04:22:02.8045096Z File "/opt/conda/lib/python3.7/site-packages/torch/distributed/rpc/internal.py", line 206, in _run_function 2022-05-18T04:22:02.8045566Z result = python_udf.func(*python_udf.args, **python_udf.kwargs) 2022-05-18T04:22:02.8046154Z File "/opt/conda/lib/python3.7/site-packages/torch/distributed/nn/api/remote_module.py", line 89, in _create_module 2022-05-18T04:22:02.8046527Z module.to(device) 2022-05-18T04:22:02.8046990Z File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 927, in to 2022-05-18T04:22:02.8047497Z return self._apply(convert) 2022-05-18T04:22:02.8047988Z File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 602, in _apply 2022-05-18T04:22:02.8048364Z param_applied = fn(param) 2022-05-18T04:22:02.8048832Z File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 925, in convert 2022-05-18T04:22:02.8049310Z return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking) 2022-05-18T04:22:02.8049720Z RuntimeError: CUDA error: invalid device ordinal 2022-05-18T04:22:02.8050173Z CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. 2022-05-18T04:22:02.8050836Z For debugging consider passing CUDA_LAUNCH_BLOCKING=1. 2022-05-18T04:22:02.8051326Z Exception raised from exchangeDevice at /var/lib/jenkins/workspace/c10/cuda/impl/CUDAGuardImpl.h:33 (most recent call first): 2022-05-18T04:22:02.8052184Z frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) + 0x6b (0x7fe2f99c11eb in /opt/conda/lib/python3.7/site-packages/torch/lib/libc10.so) 2022-05-18T04:22:02.8052919Z frame #1: + 0x14814 (0x7fe30285e814 in /opt/conda/lib/python3.7/site-packages/torch/lib/libc10_cuda.so) 2022-05-18T04:22:02.8053569Z frame #2: + 0xf044f6 (0x7fe2fab044f6 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cuda_cu.so) 2022-05-18T04:22:02.8054387Z frame #3: + 0x29fed64 (0x7fe2fc5fed64 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cuda_cu.so) 2022-05-18T04:22:02.8055050Z frame #4: + 0x29fee4b (0x7fe2fc5fee4b in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cuda_cu.so) 2022-05-18T04:22:02.8056122Z frame #5: at::_ops::empty_strided::redispatch(c10::DispatchKeySet, c10::ArrayRef, c10::ArrayRef, c10::optional, c10::optional, c10::optional, c10::optional) + 0x10f (0x7fe3042a1c6f in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:22:02.8056975Z frame #6: + 0x1a65005 (0x7fe304503005 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:22:02.8057887Z frame #7: at::_ops::empty_strided::call(c10::ArrayRef, c10::ArrayRef, c10::optional, c10::optional, c10::optional, c10::optional) + 0x174 (0x7fe3042e0a64 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:22:02.8058996Z frame #8: at::native::_to_copy(at::Tensor const&, c10::optional, c10::optional, c10::optional, c10::optional, bool, c10::optional) + 0x12da (0x7fe303d221ca in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:22:02.8059809Z frame #9: + 0x1bff1b3 (0x7fe30469d1b3 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:22:02.8060824Z frame #10: at::_ops::_to_copy::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::optional, c10::optional, c10::optional, c10::optional, bool, c10::optional) + 0x10d (0x7fe304062b0d in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:22:02.8061662Z frame #11: + 0x1a67821 (0x7fe304505821 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:22:02.8062671Z frame #12: at::_ops::_to_copy::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::optional, c10::optional, c10::optional, c10::optional, bool, c10::optional) + 0x10d (0x7fe304062b0d in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:22:02.8063497Z frame #13: + 0x2a27dde (0x7fe3054c5dde in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:22:02.8064142Z frame #14: + 0x2a2835b (0x7fe3054c635b in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:22:02.8065094Z frame #15: at::_ops::_to_copy::call(at::Tensor const&, c10::optional, c10::optional, c10::optional, c10::optional, bool, c10::optional) + 0x202 (0x7fe3040d7aa2 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:22:02.8066203Z frame #16: at::native::to(at::Tensor const&, c10::optional, c10::optional, c10::optional, c10::optional, bool, bool, c10::optional) + 0x13e (0x7fe303d193be in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:22:02.8067015Z frame #17: + 0x1cf01e9 (0x7fe30478e1e9 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:22:02.8067978Z frame #18: at::_ops::to_dtype_layout::call(at::Tensor const&, c10::optional, c10::optional, c10::optional, c10::optional, bool, bool, c10::optional) + 0x216 (0x7fe3041ecdc6 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:22:02.8068810Z frame #19: + 0x31e570 (0x7fe3103c7570 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so) 2022-05-18T04:22:02.8069529Z frame #20: + 0x31ea25 (0x7fe3103c7a25 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so) 2022-05-18T04:22:02.8070019Z frame #21: _PyMethodDescr_FastCallKeywords + 0x330 (0x562a16617fa0 in /opt/conda/bin/python) 2022-05-18T04:22:02.8070434Z frame #22: + 0x17faae (0x562a16618aae in /opt/conda/bin/python) 2022-05-18T04:22:02.8070904Z frame #23: _PyEval_EvalFrameDefault + 0x661 (0x562a1665c601 in /opt/conda/bin/python) 2022-05-18T04:22:02.8071344Z frame #24: _PyEval_EvalCodeWithName + 0xdf9 (0x562a165b2a29 in /opt/conda/bin/python) 2022-05-18T04:22:02.8071770Z frame #25: _PyFunction_FastCallKeywords + 0x583 (0x562a165d1cd3 in /opt/conda/bin/python) 2022-05-18T04:22:02.8072187Z frame #26: _PyEval_EvalFrameDefault + 0x3f5 (0x562a1665c395 in /opt/conda/bin/python) 2022-05-18T04:22:02.8072612Z frame #27: _PyFunction_FastCallKeywords + 0x187 (0x562a165d18d7 in /opt/conda/bin/python) 2022-05-18T04:22:02.8073028Z frame #28: + 0x17f9c5 (0x562a166189c5 in /opt/conda/bin/python) 2022-05-18T04:22:02.8073425Z frame #29: _PyEval_EvalFrameDefault + 0x4762 (0x562a16660702 in /opt/conda/bin/python) 2022-05-18T04:22:02.8073852Z frame #30: _PyEval_EvalCodeWithName + 0xdf9 (0x562a165b2a29 in /opt/conda/bin/python) 2022-05-18T04:22:02.8074283Z frame #31: _PyFunction_FastCallKeywords + 0x583 (0x562a165d1cd3 in /opt/conda/bin/python) 2022-05-18T04:22:02.8074705Z frame #32: + 0x17f9c5 (0x562a166189c5 in /opt/conda/bin/python) 2022-05-18T04:22:02.8075095Z frame #33: _PyEval_EvalFrameDefault + 0x4762 (0x562a16660702 in /opt/conda/bin/python) 2022-05-18T04:22:02.8075512Z frame #34: _PyFunction_FastCallDict + 0x118 (0x562a165d0cf8 in /opt/conda/bin/python) 2022-05-18T04:22:02.8075935Z frame #35: _PyEval_EvalFrameDefault + 0x1cb8 (0x562a1665dc58 in /opt/conda/bin/python) 2022-05-18T04:22:02.8076354Z frame #36: _PyFunction_FastCallDict + 0x118 (0x562a165d0cf8 in /opt/conda/bin/python) 2022-05-18T04:22:02.8076955Z frame #37: + 0x9839ef (0x7fe310a2c9ef in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so) 2022-05-18T04:22:02.8077752Z frame #38: torch::distributed::rpc::PythonRpcHandler::runPythonUdf(pybind11::object const&) + 0x7d (0x7fe310a2b39d in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so) 2022-05-18T04:22:02.8078770Z frame #39: torch::distributed::rpc::RequestCallbackImpl::runPythonFunction(pybind11::object const&, std::vector >, bool) const + 0x83 (0x7fe310a2dd83 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so) 2022-05-18T04:22:02.8079898Z frame #40: torch::distributed::rpc::RequestCallbackImpl::processPythonRemoteCall(torch::distributed::rpc::RpcCommandBase&, std::vector >) const + 0x83 (0x7fe310a2e3d3 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so) 2022-05-18T04:22:02.8081111Z frame #41: torch::distributed::rpc::RequestCallbackNoPython::processRpc(torch::distributed::rpc::RpcCommandBase&, torch::distributed::rpc::MessageType const&, std::vector >) const + 0x194 (0x7fe30675ab24 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:22:02.8082366Z frame #42: torch::distributed::rpc::RequestCallbackImpl::processRpcWithErrors(torch::distributed::rpc::RpcCommandBase&, torch::distributed::rpc::MessageType const&, std::vector >) const + 0x65 (0x7fe310a2db75 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so) 2022-05-18T04:22:02.8083247Z frame #43: + 0x3cb5e23 (0x7fe306753e23 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:22:02.8084178Z frame #44: torch::distributed::rpc::RequestCallbackNoPython::processMessage(torch::distributed::rpc::Message&, std::vector >) const + 0x538 (0x7fe306754a18 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:22:02.8085307Z frame #45: torch::distributed::rpc::RequestCallback::operator()(torch::distributed::rpc::Message&, std::vector >) const + 0x57 (0x7fe30674f097 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:22:02.8086098Z frame #46: + 0x3ce5b22 (0x7fe306783b22 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:22:02.8086815Z frame #47: c10::ThreadPool::main_loop(unsigned long) + 0x2db (0x7fe2f99ad4bb in /opt/conda/lib/python3.7/site-packages/torch/lib/libc10.so) 2022-05-18T04:22:02.8087333Z frame #48: + 0xc9039 (0x7fe31dbb4039 in /opt/conda/lib/libstdc++.so.6) 2022-05-18T04:22:02.8087881Z frame #49: + 0x76ba (0x7fe33e4e36ba in /lib/x86_64-linux-gnu/libpthread.so.0) 2022-05-18T04:22:02.8088391Z frame #50: clone + 0x6d (0x7fe33e21951d in /lib/x86_64-linux-gnu/libc.so.6) 2022-05-18T04:22:02.8088618Z 2022-05-18T04:22:02.8088641Z 2022-05-18T04:22:02.8088764Z On WorkerInfo(id=1, name=worker1): 2022-05-18T04:22:02.8124734Z RuntimeError('On WorkerInfo(id=1, name=worker1):\nRuntimeError(\'CUDA error: invalid device ordinal\nCUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.\nFor debugging consider passing CUDA_LAUNCH_BLOCKING=1.\nException raised from exchangeDevice at /var/lib/jenkins/workspace/c10/cuda/impl/CUDAGuardImpl.h:33 (most recent call first):\nframe #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) + 0x6b (0x7fe2f99c11eb in /opt/conda/lib/python3.7/site-packages/torch/lib/libc10.so)\nframe #1: + 0x14814 (0x7fe30285e814 in /opt/conda/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)\nframe #2: + 0xf044f6 (0x7fe2fab044f6 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cuda_cu.so)\nframe #3: + 0x29fed64 (0x7fe2fc5fed64 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cuda_cu.so)\nframe #4: + 0x29fee4b (0x7fe2fc5fee4b in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cuda_cu.so)\nframe #5: at::_ops::empty_strided::redispatch(c10::DispatchKeySet, c10::ArrayRef, c10::ArrayRef, c10::optional, c10::optional, c10::optional, c10::optional) + 0x10f (0x7fe3042a1c6f in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #6: + 0x1a65005 (0x7fe304503005 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #7: at::_ops::empty_strided::call(c10::ArrayRef, c10::ArrayRef, c10::optional, c10::optional, c10::optional, c10::optional) + 0x174 (0x7fe3042e0a64 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #8: at::native::_to_copy(at::Tensor const&, c10::optional, c10::optional, c10::optional, c10::optional, bool, c10::optional) + 0x12da (0x7fe303d221ca in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #9: + 0x1bff1b3 (0x7fe30469d1b3 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #10: at::_ops::_to_copy::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::optional, c10::optional, c10::optional, c10::optional, bool, c10::optional) + 0x10d (0x7fe304062b0d in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #11: + 0x1a67821 (0x7fe304505821 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #12: at::_ops::_to_copy::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::optional, c10::optional, c10::optional, c10::optional, bool, c10::optional) + 0x10d (0x7fe304062b0d in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #13: + 0x2a27dde (0x7fe3054c5dde in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #14: + 0x2a2835b (0x7fe3054c635b in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #15: at::_ops::_to_copy::call(at::Tensor const&, c10::optional, c10::optional, c10::optional, c10::optional, bool, c10::optional) + 0x202 (0x7fe3040d7aa2 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #16: at::native::to(at::Tensor const&, c10::optional, c10::optional, c10::optional, c10::optional, bool, bool, c10::optional) + 0x13e (0x7fe303d193be in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #17: + 0x1cf01e9 (0x7fe30478e1e9 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #18: at::_ops::to_dtype_layout::call(at::Tensor const&, c10::optional, c10::optional, c10::optional, c10::optional, bool, bool, c10::optional) + 0x216 (0x7fe3041ecdc6 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #19: + 0x31e570 (0x7fe3103c7570 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)\nframe #20: + 0x31ea25 (0x7fe3103c7a25 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)\nframe #21: _PyMethodDescr_FastCallKeywords + 0x330 (0x562a16617fa0 in /opt/conda/bin/python)\nframe #22: + 0x17faae (0x562a16618aae in /opt/conda/bin/python)\nframe #23: _PyEval_EvalFrameDefault + 0x661 (0x562a1665c601 in /opt/conda/bin/python)\nframe #24: _PyEval_EvalCodeWithName + 0xdf9 (0x562a165b2a29 in /opt/conda/bin/python)\nframe #25: _PyFunction_FastCallKeywords + 0x583 (0x562a165d1cd3 in /opt/conda/bin/python)\nframe #26: _PyEval_EvalFrameDefault + 0x3f5 (0x562a1665c395 in /opt/conda/bin/python)\nframe #27: _PyFunction_FastCallKeywords + 0x187 (0x562a165d18d7 in /opt/conda/bin/python)\nframe #28: + 0x17f9c5 (0x562a166189c5 in /opt/conda/bin/python)\nframe #29: _PyEval_EvalFrameDefault + 0x4762 (0x562a16660702 in /opt/conda/bin/python)\nframe #30: _PyEval_EvalCodeWithName + 0xdf9 (0x562a165b2a29 in /opt/conda/bin/python)\nframe #31: _PyFunction_FastCallKeywords + 0x583 (0x562a165d1cd3 in /opt/conda/bin/python)\nframe #32: + 0x17f9c5 (0x562a166189c5 in /opt/conda/bin/python)\nframe #33: _PyEval_EvalFrameDefault + 0x4762 (0x562a16660702 in /opt/conda/bin/python)\nframe #34: _PyFunction_FastCallDict + 0x118 (0x562a165d0cf8 in /opt/conda/bin/python)\nframe #35: _PyEval_EvalFrameDefault + 0x1cb8 (0x562a1665dc58 in /opt/conda/bin/python)\nframe #36: _PyFunction_FastCallDict + 0x118 (0x562a165d0cf8 in /opt/conda/bin/python)\nframe #37: + 0x9839ef (0x7fe310a2c9ef in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)\nframe #38: torch::distributed::rpc::PythonRpcHandler::runPythonUdf(pybind11::object const&) + 0x7d (0x7fe310a2b39d in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)\nframe #39: torch::distributed::rpc::RequestCallbackImpl::runPythonFunction(pybind11::object const&, std::vector >, bool) const + 0x83 (0x7fe310a2dd83 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)\nframe #40: torch::distributed::rpc::RequestCallbackImpl::processPythonRemoteCall(torch::distributed::rpc::RpcCommandBase&, std::vector >) const + 0x83 (0x7fe310a2e3d3 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)\nframe #41: torch::distributed::rpc::RequestCallbackNoPython::processRpc(torch::distributed::rpc::RpcCommandBase&, torch::distributed::rpc::MessageType const&, std::vector >) const + 0x194 (0x7fe30675ab24 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #42: torch::distributed::rpc::RequestCallbackImpl::processRpcWithErrors(torch::distributed::rpc::RpcCommandBase&, torch::distributed::rpc::MessageType const&, std::vector >) const + 0x65 (0x7fe310a2db75 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)\nframe #43: + 0x3cb5e23 (0x7fe306753e23 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #44: torch::distributed::rpc::RequestCallbackNoPython::processMessage(torch::distributed::rpc::Message&, std::vector >) const + 0x538 (0x7fe306754a18 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #45: torch::distributed::rpc::RequestCallback::operator()(torch::distributed::rpc::Message&, std::vector >) const + 0x57 (0x7fe30674f097 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #46: + 0x3ce5b22 (0x7fe306783b22 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #47: c10::ThreadPool::main_loop(unsigned long) + 0x2db (0x7fe2f99ad4bb in /opt/conda/lib/python3.7/site-packages/torch/lib/libc10.so)\nframe #48: + 0xc9039 (0x7fe31dbb4039 in /opt/conda/lib/libstdc++.so.6)\nframe #49: + 0x76ba (0x7fe33e4e36ba in /lib/x86_64-linux-gnu/libpthread.so.0)\nframe #50: clone + 0x6d (0x7fe33e21951d in /lib/x86_64-linux-gnu/libc.so.6)\n\')\nTraceback (most recent call last):\n File "/opt/conda/lib/python3.7/site-packages/torch/distributed/rpc/internal.py", line 206, in _run_function\n result = python_udf.func(*python_udf.args, **python_udf.kwargs)\n File "/opt/conda/lib/python3.7/site-packages/torch/distributed/nn/api/remote_module.py", line 89, in _create_module\n module.to(device)\n File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 927, in to\n return self._apply(convert)\n File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 602, in _apply\n param_applied = fn(param)\n File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 925, in convert\n return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)\nRuntimeError: CUDA error: invalid device ordinal\nCUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.\nFor debugging consider passing CUDA_LAUNCH_BLOCKING=1.\nException raised from exchangeDevice at /var/lib/jenkins/workspace/c10/cuda/impl/CUDAGuardImpl.h:33 (most recent call first):\nframe #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) + 0x6b (0x7fe2f99c11eb in /opt/conda/lib/python3.7/site-packages/torch/lib/libc10.so)\nframe #1: + 0x14814 (0x7fe30285e814 in /opt/conda/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)\nframe #2: + 0xf044f6 (0x7fe2fab044f6 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cuda_cu.so)\nframe #3: + 0x29fed64 (0x7fe2fc5fed64 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cuda_cu.so)\nframe #4: + 0x29fee4b (0x7fe2fc5fee4b in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cuda_cu.so)\nframe #5: at::_ops::empty_strided::redispatch(c10::DispatchKeySet, c10::ArrayRef, c10::ArrayRef, c10::optional, c10::optional, c10::optional, c10::optional) + 0x10f (0x7fe3042a1c6f in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #6: + 0x1a65005 (0x7fe304503005 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #7: at::_ops::empty_strided::call(c10::ArrayRef, c10::ArrayRef, c10::optional, c10::optional, c10::optional, c10::optional) + 0x174 (0x7fe3042e0a64 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #8: at::native::_to_copy(at::Tensor const&, c10::optional, c10::optional, c10::optional, c10::optional, bool, c10::optional) + 0x12da (0x7fe303d221ca in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #9: + 0x1bff1b3 (0x7fe30469d1b3 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #10: at::_ops::_to_copy::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::optional, c10::optional, c10::optional, c10::optional, bool, c10::optional) + 0x10d (0x7fe304062b0d in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #11: + 0x1a67821 (0x7fe304505821 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #12: at::_ops::_to_copy::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::optional, c10::optional, c10::optional, c10::optional, bool, c10::optional) + 0x10d (0x7fe304062b0d in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #13: + 0x2a27dde (0x7fe3054c5dde in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #14: + 0x2a2835b (0x7fe3054c635b in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #15: at::_ops::_to_copy::call(at::Tensor const&, c10::optional, c10::optional, c10::optional, c10::optional, bool, c10::optional) + 0x202 (0x7fe3040d7aa2 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #16: at::native::to(at::Tensor const&, c10::optional, c10::optional, c10::optional, c10::optional, bool, bool, c10::optional) + 0x13e (0x7fe303d193be in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #17: + 0x1cf01e9 (0x7fe30478e1e9 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #18: at::_ops::to_dtype_layout::call(at::Tensor const&, c10::optional, c10::optional, c10::optional, c10::optional, bool, bool, c10::optional) + 0x216 (0x7fe3041ecdc6 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #19: + 0x31e570 (0x7fe3103c7570 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)\nframe #20: + 0x31ea25 (0x7fe3103c7a25 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)\nframe #21: _PyMethodDescr_FastCallKeywords + 0x330 (0x562a16617fa0 in /opt/conda/bin/python)\nframe #22: + 0x17faae (0x562a16618aae in /opt/conda/bin/python)\nframe #23: _PyEval_EvalFrameDefault + 0x661 (0x562a1665c601 in /opt/conda/bin/python)\nframe #24: _PyEval_EvalCodeWithName + 0xdf9 (0x562a165b2a29 in /opt/conda/bin/python)\nframe #25: _PyFunction_FastCallKeywords + 0x583 (0x562a165d1cd3 in /opt/conda/bin/python)\nframe #26: _PyEval_EvalFrameDefault + 0x3f5 (0x562a1665c395 in /opt/conda/bin/python)\nframe #27: _PyFunction_FastCallKeywords + 0x187 (0x562a165d18d7 in /opt/conda/bin/python)\nframe #28: + 0x17f9c5 (0x562a166189c5 in /opt/conda/bin/python)\nframe #29: _PyEval_EvalFrameDefault + 0x4762 (0x562a16660702 in /opt/conda/bin/python)\nframe #30: _PyEval_EvalCodeWithName + 0xdf9 (0x562a165b2a29 in /opt/conda/bin/python)\nframe #31: _PyFunction_FastCallKeywords + 0x583 (0x562a165d1cd3 in /opt/conda/bin/python)\nframe #32: + 0x17f9c5 (0x562a166189c5 in /opt/conda/bin/python)\nframe #33: _PyEval_EvalFrameDefault + 0x4762 (0x562a16660702 in /opt/conda/bin/python)\nframe #34: _PyFunction_FastCallDict + 0x118 (0x562a165d0cf8 in /opt/conda/bin/python)\nframe #35: _PyEval_EvalFrameDefault + 0x1cb8 (0x562a1665dc58 in /opt/conda/bin/python)\nframe #36: _PyFunction_FastCallDict + 0x118 (0x562a165d0cf8 in /opt/conda/bin/python)\nframe #37: + 0x9839ef (0x7fe310a2c9ef in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)\nframe #38: torch::distributed::rpc::PythonRpcHandler::runPythonUdf(pybind11::object const&) + 0x7d (0x7fe310a2b39d in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)\nframe #39: torch::distributed::rpc::RequestCallbackImpl::runPythonFunction(pybind11::object const&, std::vector >, bool) const + 0x83 (0x7fe310a2dd83 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)\nframe #40: torch::distributed::rpc::RequestCallbackImpl::processPythonRemoteCall(torch::distributed::rpc::RpcCommandBase&, std::vector >) const + 0x83 (0x7fe310a2e3d3 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)\nframe #41: torch::distributed::rpc::RequestCallbackNoPython::processRpc(torch::distributed::rpc::RpcCommandBase&, torch::distributed::rpc::MessageType const&, std::vector >) const + 0x194 (0x7fe30675ab24 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #42: torch::distributed::rpc::RequestCallbackImpl::processRpcWithErrors(torch::distributed::rpc::RpcCommandBase&, torch::distributed::rpc::MessageType const&, std::vector >) const + 0x65 (0x7fe310a2db75 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)\nframe #43: + 0x3cb5e23 (0x7fe306753e23 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #44: torch::distributed::rpc::RequestCallbackNoPython::processMessage(torch::distributed::rpc::Message&, std::vector >) const + 0x538 (0x7fe306754a18 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #45: torch::distributed::rpc::RequestCallback::operator()(torch::distributed::rpc::Message&, std::vector >) const + 0x57 (0x7fe30674f097 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #46: + 0x3ce5b22 (0x7fe306783b22 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #47: c10::ThreadPool::main_loop(unsigned long) + 0x2db (0x7fe2f99ad4bb in /opt/conda/lib/python3.7/site-packages/torch/lib/libc10.so)\nframe #48: + 0xc9039 (0x7fe31dbb4039 in /opt/conda/lib/libstdc++.so.6)\nframe #49: + 0x76ba (0x7fe33e4e36ba in /lib/x86_64-linux-gnu/libpthread.so.0)\nframe #50: clone + 0x6d (0x7fe33e21951d in /lib/x86_64-linux-gnu/libc.so.6)\n\n') 2022-05-18T04:22:02.8146329Z Traceback (most recent call last): 2022-05-18T04:22:02.8146876Z File "/opt/conda/lib/python3.7/site-packages/torch/distributed/rpc/internal.py", line 206, in _run_function 2022-05-18T04:22:02.8147327Z result = python_udf.func(*python_udf.args, **python_udf.kwargs) 2022-05-18T04:22:02.8147779Z File "/tmp/tmphe0ho6pi/_remote_module_non_scriptable.py", line 47, in _remote_forward 2022-05-18T04:22:02.8148154Z module = module_rref.local_value() 2022-05-18T04:22:02.8148690Z File "/opt/conda/lib/python3.7/site-packages/torch/distributed/rpc/internal.py", line 220, in _handle_exception 2022-05-18T04:22:02.8149249Z raise result.exception_type(result.msg.encode("utf-8").decode("unicode_escape")) 2022-05-18T04:22:02.8149652Z RuntimeError: On WorkerInfo(id=1, name=worker1): 2022-05-18T04:22:02.8150055Z RuntimeError('CUDA error: invalid device ordinal 2022-05-18T04:22:02.8150490Z CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. 2022-05-18T04:22:02.8150945Z For debugging consider passing CUDA_LAUNCH_BLOCKING=1. 2022-05-18T04:22:02.8151425Z Exception raised from exchangeDevice at /var/lib/jenkins/workspace/c10/cuda/impl/CUDAGuardImpl.h:33 (most recent call first): 2022-05-18T04:22:02.8152380Z frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) + 0x6b (0x7fe2f99c11eb in /opt/conda/lib/python3.7/site-packages/torch/lib/libc10.so) 2022-05-18T04:22:02.8153148Z frame #1: + 0x14814 (0x7fe30285e814 in /opt/conda/lib/python3.7/site-packages/torch/lib/libc10_cuda.so) 2022-05-18T04:22:02.8153804Z frame #2: + 0xf044f6 (0x7fe2fab044f6 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cuda_cu.so) 2022-05-18T04:22:02.8154462Z frame #3: + 0x29fed64 (0x7fe2fc5fed64 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cuda_cu.so) 2022-05-18T04:22:02.8155110Z frame #4: + 0x29fee4b (0x7fe2fc5fee4b in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cuda_cu.so) 2022-05-18T04:22:02.8156098Z frame #5: at::_ops::empty_strided::redispatch(c10::DispatchKeySet, c10::ArrayRef, c10::ArrayRef, c10::optional, c10::optional, c10::optional, c10::optional) + 0x10f (0x7fe3042a1c6f in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:22:02.8156931Z frame #6: + 0x1a65005 (0x7fe304503005 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:22:02.8157859Z frame #7: at::_ops::empty_strided::call(c10::ArrayRef, c10::ArrayRef, c10::optional, c10::optional, c10::optional, c10::optional) + 0x174 (0x7fe3042e0a64 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:22:02.8158950Z frame #8: at::native::_to_copy(at::Tensor const&, c10::optional, c10::optional, c10::optional, c10::optional, bool, c10::optional) + 0x12da (0x7fe303d221ca in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:22:02.8159755Z frame #9: + 0x1bff1b3 (0x7fe30469d1b3 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:22:02.8160764Z frame #10: at::_ops::_to_copy::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::optional, c10::optional, c10::optional, c10::optional, bool, c10::optional) + 0x10d (0x7fe304062b0d in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:22:02.8161579Z frame #11: + 0x1a67821 (0x7fe304505821 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:22:02.8162585Z frame #12: at::_ops::_to_copy::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::optional, c10::optional, c10::optional, c10::optional, bool, c10::optional) + 0x10d (0x7fe304062b0d in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:22:02.8163432Z frame #13: + 0x2a27dde (0x7fe3054c5dde in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:22:02.8164070Z frame #14: + 0x2a2835b (0x7fe3054c635b in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:22:02.8165022Z frame #15: at::_ops::_to_copy::call(at::Tensor const&, c10::optional, c10::optional, c10::optional, c10::optional, bool, c10::optional) + 0x202 (0x7fe3040d7aa2 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:22:02.8166118Z frame #16: at::native::to(at::Tensor const&, c10::optional, c10::optional, c10::optional, c10::optional, bool, bool, c10::optional) + 0x13e (0x7fe303d193be in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:22:02.8166997Z frame #17: + 0x1cf01e9 (0x7fe30478e1e9 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:22:02.8168018Z frame #18: at::_ops::to_dtype_layout::call(at::Tensor const&, c10::optional, c10::optional, c10::optional, c10::optional, bool, bool, c10::optional) + 0x216 (0x7fe3041ecdc6 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:22:02.8168860Z frame #19: + 0x31e570 (0x7fe3103c7570 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so) 2022-05-18T04:22:02.8169504Z frame #20: + 0x31ea25 (0x7fe3103c7a25 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so) 2022-05-18T04:22:02.8169981Z frame #21: _PyMethodDescr_FastCallKeywords + 0x330 (0x562a16617fa0 in /opt/conda/bin/python) 2022-05-18T04:22:02.8170686Z frame #22: + 0x17faae (0x562a16618aae in /opt/conda/bin/python) 2022-05-18T04:22:02.8171111Z frame #23: _PyEval_EvalFrameDefault + 0x661 (0x562a1665c601 in /opt/conda/bin/python) 2022-05-18T04:22:02.8171538Z frame #24: _PyEval_EvalCodeWithName + 0xdf9 (0x562a165b2a29 in /opt/conda/bin/python) 2022-05-18T04:22:02.8171949Z frame #25: _PyFunction_FastCallKeywords + 0x583 (0x562a165d1cd3 in /opt/conda/bin/python) 2022-05-18T04:22:02.8172379Z frame #26: _PyEval_EvalFrameDefault + 0x3f5 (0x562a1665c395 in /opt/conda/bin/python) 2022-05-18T04:22:02.8172800Z frame #27: _PyFunction_FastCallKeywords + 0x187 (0x562a165d18d7 in /opt/conda/bin/python) 2022-05-18T04:22:02.8173203Z frame #28: + 0x17f9c5 (0x562a166189c5 in /opt/conda/bin/python) 2022-05-18T04:22:02.8173615Z frame #29: _PyEval_EvalFrameDefault + 0x4762 (0x562a16660702 in /opt/conda/bin/python) 2022-05-18T04:22:02.8174039Z frame #30: _PyEval_EvalCodeWithName + 0xdf9 (0x562a165b2a29 in /opt/conda/bin/python) 2022-05-18T04:22:02.8174470Z frame #31: _PyFunction_FastCallKeywords + 0x583 (0x562a165d1cd3 in /opt/conda/bin/python) 2022-05-18T04:22:02.8174877Z frame #32: + 0x17f9c5 (0x562a166189c5 in /opt/conda/bin/python) 2022-05-18T04:22:02.8175282Z frame #33: _PyEval_EvalFrameDefault + 0x4762 (0x562a16660702 in /opt/conda/bin/python) 2022-05-18T04:22:02.8175698Z frame #34: _PyFunction_FastCallDict + 0x118 (0x562a165d0cf8 in /opt/conda/bin/python) 2022-05-18T04:22:02.8176108Z frame #35: _PyEval_EvalFrameDefault + 0x1cb8 (0x562a1665dc58 in /opt/conda/bin/python) 2022-05-18T04:22:02.8176532Z frame #36: _PyFunction_FastCallDict + 0x118 (0x562a165d0cf8 in /opt/conda/bin/python) 2022-05-18T04:22:02.8177150Z frame #37: + 0x9839ef (0x7fe310a2c9ef in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so) 2022-05-18T04:22:02.8177942Z frame #38: torch::distributed::rpc::PythonRpcHandler::runPythonUdf(pybind11::object const&) + 0x7d (0x7fe310a2b39d in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so) 2022-05-18T04:22:02.8178931Z frame #39: torch::distributed::rpc::RequestCallbackImpl::runPythonFunction(pybind11::object const&, std::vector >, bool) const + 0x83 (0x7fe310a2dd83 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so) 2022-05-18T04:22:02.8180065Z frame #40: torch::distributed::rpc::RequestCallbackImpl::processPythonRemoteCall(torch::distributed::rpc::RpcCommandBase&, std::vector >) const + 0x83 (0x7fe310a2e3d3 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so) 2022-05-18T04:22:02.8181282Z frame #41: torch::distributed::rpc::RequestCallbackNoPython::processRpc(torch::distributed::rpc::RpcCommandBase&, torch::distributed::rpc::MessageType const&, std::vector >) const + 0x194 (0x7fe30675ab24 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:22:02.8182545Z frame #42: torch::distributed::rpc::RequestCallbackImpl::processRpcWithErrors(torch::distributed::rpc::RpcCommandBase&, torch::distributed::rpc::MessageType const&, std::vector >) const + 0x65 (0x7fe310a2db75 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so) 2022-05-18T04:22:02.8183525Z frame #43: + 0x3cb5e23 (0x7fe306753e23 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:22:02.8184512Z frame #44: torch::distributed::rpc::RequestCallbackNoPython::processMessage(torch::distributed::rpc::Message&, std::vector >) const + 0x538 (0x7fe306754a18 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:22:02.8185586Z frame #45: torch::distributed::rpc::RequestCallback::operator()(torch::distributed::rpc::Message&, std::vector >) const + 0x57 (0x7fe30674f097 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:22:02.8186351Z frame #46: + 0x3ce5b22 (0x7fe306783b22 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:22:02.8187040Z frame #47: c10::ThreadPool::main_loop(unsigned long) + 0x2db (0x7fe2f99ad4bb in /opt/conda/lib/python3.7/site-packages/torch/lib/libc10.so) 2022-05-18T04:22:02.8187539Z frame #48: + 0xc9039 (0x7fe31dbb4039 in /opt/conda/lib/libstdc++.so.6) 2022-05-18T04:22:02.8188090Z frame #49: + 0x76ba (0x7fe33e4e36ba in /lib/x86_64-linux-gnu/libpthread.so.0) 2022-05-18T04:22:02.8188586Z frame #50: clone + 0x6d (0x7fe33e21951d in /lib/x86_64-linux-gnu/libc.so.6) 2022-05-18T04:22:02.8188909Z ') 2022-05-18T04:22:02.8189162Z Traceback (most recent call last): 2022-05-18T04:22:02.8189667Z File "/opt/conda/lib/python3.7/site-packages/torch/distributed/rpc/internal.py", line 206, in _run_function 2022-05-18T04:22:02.8190129Z result = python_udf.func(*python_udf.args, **python_udf.kwargs) 2022-05-18T04:22:02.8190705Z File "/opt/conda/lib/python3.7/site-packages/torch/distributed/nn/api/remote_module.py", line 89, in _create_module 2022-05-18T04:22:02.8191090Z module.to(device) 2022-05-18T04:22:02.8191529Z File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 927, in to 2022-05-18T04:22:02.8191894Z return self._apply(convert) 2022-05-18T04:22:02.8192372Z File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 602, in _apply 2022-05-18T04:22:02.8192729Z param_applied = fn(param) 2022-05-18T04:22:02.8193208Z File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 925, in convert 2022-05-18T04:22:02.8193676Z return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking) 2022-05-18T04:22:02.8194061Z RuntimeError: CUDA error: invalid device ordinal 2022-05-18T04:22:02.8194519Z CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. 2022-05-18T04:22:02.8194971Z For debugging consider passing CUDA_LAUNCH_BLOCKING=1. 2022-05-18T04:22:02.8195453Z Exception raised from exchangeDevice at /var/lib/jenkins/workspace/c10/cuda/impl/CUDAGuardImpl.h:33 (most recent call first): 2022-05-18T04:22:02.8196273Z frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) + 0x6b (0x7fe2f99c11eb in /opt/conda/lib/python3.7/site-packages/torch/lib/libc10.so) 2022-05-18T04:22:02.8197000Z frame #1: + 0x14814 (0x7fe30285e814 in /opt/conda/lib/python3.7/site-packages/torch/lib/libc10_cuda.so) 2022-05-18T04:22:02.8197640Z frame #2: + 0xf044f6 (0x7fe2fab044f6 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cuda_cu.so) 2022-05-18T04:22:02.8198295Z frame #3: + 0x29fed64 (0x7fe2fc5fed64 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cuda_cu.so) 2022-05-18T04:22:02.8198929Z frame #4: + 0x29fee4b (0x7fe2fc5fee4b in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cuda_cu.so) 2022-05-18T04:22:02.8200002Z frame #5: at::_ops::empty_strided::redispatch(c10::DispatchKeySet, c10::ArrayRef, c10::ArrayRef, c10::optional, c10::optional, c10::optional, c10::optional) + 0x10f (0x7fe3042a1c6f in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:22:02.8200882Z frame #6: + 0x1a65005 (0x7fe304503005 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:22:02.8201818Z frame #7: at::_ops::empty_strided::call(c10::ArrayRef, c10::ArrayRef, c10::optional, c10::optional, c10::optional, c10::optional) + 0x174 (0x7fe3042e0a64 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:22:02.8202919Z frame #8: at::native::_to_copy(at::Tensor const&, c10::optional, c10::optional, c10::optional, c10::optional, bool, c10::optional) + 0x12da (0x7fe303d221ca in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:22:02.8203728Z frame #9: + 0x1bff1b3 (0x7fe30469d1b3 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:22:02.8204723Z frame #10: at::_ops::_to_copy::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::optional, c10::optional, c10::optional, c10::optional, bool, c10::optional) + 0x10d (0x7fe304062b0d in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:22:02.8205562Z frame #11: + 0x1a67821 (0x7fe304505821 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:22:02.8206563Z frame #12: at::_ops::_to_copy::redispatch(c10::DispatchKeySet, at::Tensor const&, c10::optional, c10::optional, c10::optional, c10::optional, bool, c10::optional) + 0x10d (0x7fe304062b0d in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:22:02.8207405Z frame #13: + 0x2a27dde (0x7fe3054c5dde in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:22:02.8208044Z frame #14: + 0x2a2835b (0x7fe3054c635b in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:22:02.8208977Z frame #15: at::_ops::_to_copy::call(at::Tensor const&, c10::optional, c10::optional, c10::optional, c10::optional, bool, c10::optional) + 0x202 (0x7fe3040d7aa2 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:22:02.8210091Z frame #16: at::native::to(at::Tensor const&, c10::optional, c10::optional, c10::optional, c10::optional, bool, bool, c10::optional) + 0x13e (0x7fe303d193be in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:22:02.8211631Z frame #17: + 0x1cf01e9 (0x7fe30478e1e9 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:22:02.8212618Z frame #18: at::_ops::to_dtype_layout::call(at::Tensor const&, c10::optional, c10::optional, c10::optional, c10::optional, bool, bool, c10::optional) + 0x216 (0x7fe3041ecdc6 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:22:02.8213443Z frame #19: + 0x31e570 (0x7fe3103c7570 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so) 2022-05-18T04:22:02.8214074Z frame #20: + 0x31ea25 (0x7fe3103c7a25 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so) 2022-05-18T04:22:02.8214565Z frame #21: _PyMethodDescr_FastCallKeywords + 0x330 (0x562a16617fa0 in /opt/conda/bin/python) 2022-05-18T04:22:02.8214997Z frame #22: + 0x17faae (0x562a16618aae in /opt/conda/bin/python) 2022-05-18T04:22:02.8215517Z frame #23: _PyEval_EvalFrameDefault + 0x661 (0x562a1665c601 in /opt/conda/bin/python) 2022-05-18T04:22:02.8215929Z frame #24: _PyEval_EvalCodeWithName + 0xdf9 (0x562a165b2a29 in /opt/conda/bin/python) 2022-05-18T04:22:02.8216363Z frame #25: _PyFunction_FastCallKeywords + 0x583 (0x562a165d1cd3 in /opt/conda/bin/python) 2022-05-18T04:22:02.8216850Z frame #26: _PyEval_EvalFrameDefault + 0x3f5 (0x562a1665c395 in /opt/conda/bin/python) 2022-05-18T04:22:02.8217269Z frame #27: _PyFunction_FastCallKeywords + 0x187 (0x562a165d18d7 in /opt/conda/bin/python) 2022-05-18T04:22:02.8217688Z frame #28: + 0x17f9c5 (0x562a166189c5 in /opt/conda/bin/python) 2022-05-18T04:22:02.8218095Z frame #29: _PyEval_EvalFrameDefault + 0x4762 (0x562a16660702 in /opt/conda/bin/python) 2022-05-18T04:22:02.8218514Z frame #30: _PyEval_EvalCodeWithName + 0xdf9 (0x562a165b2a29 in /opt/conda/bin/python) 2022-05-18T04:22:02.8218925Z frame #31: _PyFunction_FastCallKeywords + 0x583 (0x562a165d1cd3 in /opt/conda/bin/python) 2022-05-18T04:22:02.8219346Z frame #32: + 0x17f9c5 (0x562a166189c5 in /opt/conda/bin/python) 2022-05-18T04:22:02.8219752Z frame #33: _PyEval_EvalFrameDefault + 0x4762 (0x562a16660702 in /opt/conda/bin/python) 2022-05-18T04:22:02.8220167Z frame #34: _PyFunction_FastCallDict + 0x118 (0x562a165d0cf8 in /opt/conda/bin/python) 2022-05-18T04:22:02.8220576Z frame #35: _PyEval_EvalFrameDefault + 0x1cb8 (0x562a1665dc58 in /opt/conda/bin/python) 2022-05-18T04:22:02.8221000Z frame #36: _PyFunction_FastCallDict + 0x118 (0x562a165d0cf8 in /opt/conda/bin/python) 2022-05-18T04:22:02.8221614Z frame #37: + 0x9839ef (0x7fe310a2c9ef in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so) 2022-05-18T04:22:02.8222392Z frame #38: torch::distributed::rpc::PythonRpcHandler::runPythonUdf(pybind11::object const&) + 0x7d (0x7fe310a2b39d in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so) 2022-05-18T04:22:02.8223400Z frame #39: torch::distributed::rpc::RequestCallbackImpl::runPythonFunction(pybind11::object const&, std::vector >, bool) const + 0x83 (0x7fe310a2dd83 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so) 2022-05-18T04:22:02.8224527Z frame #40: torch::distributed::rpc::RequestCallbackImpl::processPythonRemoteCall(torch::distributed::rpc::RpcCommandBase&, std::vector >) const + 0x83 (0x7fe310a2e3d3 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so) 2022-05-18T04:22:02.8225740Z frame #41: torch::distributed::rpc::RequestCallbackNoPython::processRpc(torch::distributed::rpc::RpcCommandBase&, torch::distributed::rpc::MessageType const&, std::vector >) const + 0x194 (0x7fe30675ab24 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:22:02.8227000Z frame #42: torch::distributed::rpc::RequestCallbackImpl::processRpcWithErrors(torch::distributed::rpc::RpcCommandBase&, torch::distributed::rpc::MessageType const&, std::vector >) const + 0x65 (0x7fe310a2db75 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so) 2022-05-18T04:22:02.8227892Z frame #43: + 0x3cb5e23 (0x7fe306753e23 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:22:02.8228807Z frame #44: torch::distributed::rpc::RequestCallbackNoPython::processMessage(torch::distributed::rpc::Message&, std::vector >) const + 0x538 (0x7fe306754a18 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:22:02.8229872Z frame #45: torch::distributed::rpc::RequestCallback::operator()(torch::distributed::rpc::Message&, std::vector >) const + 0x57 (0x7fe30674f097 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:22:02.8230721Z frame #46: + 0x3ce5b22 (0x7fe306783b22 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:22:02.8231399Z frame #47: c10::ThreadPool::main_loop(unsigned long) + 0x2db (0x7fe2f99ad4bb in /opt/conda/lib/python3.7/site-packages/torch/lib/libc10.so) 2022-05-18T04:22:02.8231900Z frame #48: + 0xc9039 (0x7fe31dbb4039 in /opt/conda/lib/libstdc++.so.6) 2022-05-18T04:22:02.8232476Z frame #49: + 0x76ba (0x7fe33e4e36ba in /lib/x86_64-linux-gnu/libpthread.so.0) 2022-05-18T04:22:02.8232991Z frame #50: clone + 0x6d (0x7fe33e21951d in /lib/x86_64-linux-gnu/libc.so.6) 2022-05-18T04:22:02.8233217Z 2022-05-18T04:22:02.8233236Z 2022-05-18T04:22:02.8233257Z 2022-05-18T04:22:02.9771665Z ok (3.331s) 2022-05-18T04:22:02.9772017Z 2022-05-18T04:22:02.9772576Z ---------------------------------------------------------------------- 2022-05-18T04:22:02.9772913Z Ran 1 test in 3.332s 2022-05-18T04:22:02.9773081Z 2022-05-18T04:22:02.9773202Z OK 2022-05-18T04:22:02.9773341Z 2022-05-18T04:22:02.9773478Z Generating XML reports... 2022-05-18T04:22:02.9816141Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeCudaRemoteModuleTest-20220518042159.xml 2022-05-18T04:22:04.1505226Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp4xql7_4g 2022-05-18T04:22:04.1507306Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp4xql7_4g/_remote_module_non_scriptable.py 2022-05-18T04:22:04.5234016Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:22:04.5249006Z 2022-05-18T04:22:04.5249442Z Running tests... 2022-05-18T04:22:04.5249909Z ---------------------------------------------------------------------- 2022-05-18T04:22:06.1620733Z test_valid_device (__main__.TensorPipeCudaRemoteModuleTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:22:06.2257653Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 2379 2022-05-18T04:22:06.2358355Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 2380 2022-05-18T04:22:07.1463502Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpqz88fpkq 2022-05-18T04:22:07.1464716Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpqz88fpkq/_remote_module_non_scriptable.py 2022-05-18T04:22:07.1745460Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpssrvp3p_ 2022-05-18T04:22:07.1747654Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpssrvp3p_/_remote_module_non_scriptable.py 2022-05-18T04:22:07.5045952Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:22:07.5403154Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:22:09.2442340Z ok (4.719s) 2022-05-18T04:22:09.2442568Z 2022-05-18T04:22:09.2442990Z ---------------------------------------------------------------------- 2022-05-18T04:22:09.2443337Z Ran 1 test in 4.719s 2022-05-18T04:22:09.2443507Z 2022-05-18T04:22:09.2443604Z OK 2022-05-18T04:22:09.2443742Z 2022-05-18T04:22:09.2443879Z Generating XML reports... 2022-05-18T04:22:09.2486810Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeCudaRemoteModuleTest-20220518042204.xml 2022-05-18T04:22:10.3986775Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmplv0wgrs1 2022-05-18T04:22:10.3987797Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmplv0wgrs1/_remote_module_non_scriptable.py 2022-05-18T04:22:10.7635351Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:22:10.7649942Z 2022-05-18T04:22:10.7650192Z Running tests... 2022-05-18T04:22:10.7650641Z ---------------------------------------------------------------------- 2022-05-18T04:22:12.3705347Z test_profiler_remote_cuda (__main__.TensorPipeCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:22:12.4322243Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 2573 2022-05-18T04:22:12.4422878Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 2574 2022-05-18T04:22:12.4524857Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 2575 2022-05-18T04:22:12.4626750Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 2576 2022-05-18T04:22:13.3296480Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpbns4e70i 2022-05-18T04:22:13.3297326Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpbns4e70i/_remote_module_non_scriptable.py 2022-05-18T04:22:13.3430263Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmppgjnutjb 2022-05-18T04:22:13.3433131Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmppgjnutjb/_remote_module_non_scriptable.py 2022-05-18T04:22:13.3602587Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpmjv069ql 2022-05-18T04:22:13.3605338Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpmjv069ql/_remote_module_non_scriptable.py 2022-05-18T04:22:13.3944695Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpr3u74ffm 2022-05-18T04:22:13.3948082Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpr3u74ffm/_remote_module_non_scriptable.py 2022-05-18T04:22:13.6865567Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:22:13.7111601Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:22:13.7122327Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:22:13.7566232Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:22:18.1802435Z ok (7.415s) 2022-05-18T04:22:18.1802676Z 2022-05-18T04:22:18.1803084Z ---------------------------------------------------------------------- 2022-05-18T04:22:18.1803443Z Ran 1 test in 7.415s 2022-05-18T04:22:18.1803616Z 2022-05-18T04:22:18.1803720Z OK 2022-05-18T04:22:18.1803859Z 2022-05-18T04:22:18.1803997Z Generating XML reports... 2022-05-18T04:22:18.1848758Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeCudaRpcTest-20220518042210.xml 2022-05-18T04:22:19.3549997Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpwkcdjj5f 2022-05-18T04:22:19.3550912Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpwkcdjj5f/_remote_module_non_scriptable.py 2022-05-18T04:22:19.7258150Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:22:19.7273473Z 2022-05-18T04:22:19.7273929Z Running tests... 2022-05-18T04:22:19.7274431Z ---------------------------------------------------------------------- 2022-05-18T04:22:21.3672370Z test_basic_gloo_ckpt_always (__main__.TensorPipePipeWithDDPTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:22:21.4321759Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 2926 2022-05-18T04:22:21.4423707Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 2927 2022-05-18T04:22:22.3064913Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpixus063_ 2022-05-18T04:22:22.3065776Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpixus063_/_remote_module_non_scriptable.py 2022-05-18T04:22:22.3289388Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpal2t76s5 2022-05-18T04:22:22.3291926Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpal2t76s5/_remote_module_non_scriptable.py 2022-05-18T04:22:22.6637114Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:22:22.6849145Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:22:22.8472647Z skip: Need at least 4 CUDA devices (3.119s) 2022-05-18T04:22:22.8473104Z 2022-05-18T04:22:22.8473733Z ---------------------------------------------------------------------- 2022-05-18T04:22:22.8474376Z Ran 1 test in 3.120s 2022-05-18T04:22:22.8474690Z 2022-05-18T04:22:22.8474890Z OK (skipped=1) 2022-05-18T04:22:22.8475544Z 2022-05-18T04:22:22.8475843Z Generating XML reports... 2022-05-18T04:22:22.8519254Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipePipeWithDDPTest-20220518042219.xml 2022-05-18T04:22:24.0102386Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpd3fdorpz 2022-05-18T04:22:24.0103517Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpd3fdorpz/_remote_module_non_scriptable.py 2022-05-18T04:22:24.3683602Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:22:24.3698528Z 2022-05-18T04:22:24.3699034Z Running tests... 2022-05-18T04:22:24.3699536Z ---------------------------------------------------------------------- 2022-05-18T04:22:25.9801855Z test_basic_gloo_ckpt_except_last (__main__.TensorPipePipeWithDDPTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:22:26.0417693Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 3035 2022-05-18T04:22:26.0517905Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 3036 2022-05-18T04:22:26.9371393Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpfdu1_c1b 2022-05-18T04:22:26.9372894Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpfdu1_c1b/_remote_module_non_scriptable.py 2022-05-18T04:22:26.9815793Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp_xxgcgf2 2022-05-18T04:22:26.9818420Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp_xxgcgf2/_remote_module_non_scriptable.py 2022-05-18T04:22:27.3045769Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:22:27.3359230Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:22:27.5569473Z skip: Need at least 4 CUDA devices (3.187s) 2022-05-18T04:22:27.5569750Z 2022-05-18T04:22:27.5570167Z ---------------------------------------------------------------------- 2022-05-18T04:22:27.5570724Z Ran 1 test in 3.187s 2022-05-18T04:22:27.5570901Z 2022-05-18T04:22:27.5571020Z OK (skipped=1) 2022-05-18T04:22:27.5571184Z 2022-05-18T04:22:27.5571316Z Generating XML reports... 2022-05-18T04:22:27.5617046Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipePipeWithDDPTest-20220518042224.xml 2022-05-18T04:22:28.7097489Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmphu9qbmvb 2022-05-18T04:22:28.7098674Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmphu9qbmvb/_remote_module_non_scriptable.py 2022-05-18T04:22:29.0684527Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:22:29.0699379Z 2022-05-18T04:22:29.0699754Z Running tests... 2022-05-18T04:22:29.0700248Z ---------------------------------------------------------------------- 2022-05-18T04:22:30.6766066Z test_basic_gloo_ckpt_never (__main__.TensorPipePipeWithDDPTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:22:30.7379961Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 3144 2022-05-18T04:22:30.7480910Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 3145 2022-05-18T04:22:31.6313734Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpqr6slodn 2022-05-18T04:22:31.6314876Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpqr6slodn/_remote_module_non_scriptable.py 2022-05-18T04:22:31.6361017Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp8velszdk 2022-05-18T04:22:31.6363426Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp8velszdk/_remote_module_non_scriptable.py 2022-05-18T04:22:31.9873165Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:22:31.9992457Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:22:32.1528076Z skip: Need at least 4 CUDA devices (3.083s) 2022-05-18T04:22:32.1528307Z 2022-05-18T04:22:32.1528687Z ---------------------------------------------------------------------- 2022-05-18T04:22:32.1529031Z Ran 1 test in 3.083s 2022-05-18T04:22:32.1529206Z 2022-05-18T04:22:32.1529300Z OK (skipped=1) 2022-05-18T04:22:32.1529457Z 2022-05-18T04:22:32.1529586Z Generating XML reports... 2022-05-18T04:22:32.1573114Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipePipeWithDDPTest-20220518042229.xml 2022-05-18T04:22:33.3092223Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpghypz3dg 2022-05-18T04:22:33.3093100Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpghypz3dg/_remote_module_non_scriptable.py 2022-05-18T04:22:33.6636288Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:22:33.6650725Z 2022-05-18T04:22:33.6650986Z Running tests... 2022-05-18T04:22:33.6651433Z ---------------------------------------------------------------------- 2022-05-18T04:22:35.2718469Z test_basic_gloo_ckpt_never_find_unused (__main__.TensorPipePipeWithDDPTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:22:35.3359062Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 3253 2022-05-18T04:22:35.3460906Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 3254 2022-05-18T04:22:36.2263328Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp1_0slwn8 2022-05-18T04:22:36.2264219Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp1_0slwn8/_remote_module_non_scriptable.py 2022-05-18T04:22:36.2312580Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpjp8pse9t 2022-05-18T04:22:36.2315314Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpjp8pse9t/_remote_module_non_scriptable.py 2022-05-18T04:22:36.5826909Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:22:36.5981388Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:22:36.7509658Z skip: Need at least 4 CUDA devices (3.086s) 2022-05-18T04:22:36.7509924Z 2022-05-18T04:22:36.7510327Z ---------------------------------------------------------------------- 2022-05-18T04:22:36.7510671Z Ran 1 test in 3.086s 2022-05-18T04:22:36.7510822Z 2022-05-18T04:22:36.7510936Z OK (skipped=1) 2022-05-18T04:22:36.7511112Z 2022-05-18T04:22:36.7511240Z Generating XML reports... 2022-05-18T04:22:36.7553265Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipePipeWithDDPTest-20220518042233.xml 2022-05-18T04:22:37.9321028Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpjlat_9hr 2022-05-18T04:22:37.9322517Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpjlat_9hr/_remote_module_non_scriptable.py 2022-05-18T04:22:38.2910519Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:22:38.2925697Z 2022-05-18T04:22:38.2925912Z Running tests... 2022-05-18T04:22:38.2926372Z ---------------------------------------------------------------------- 2022-05-18T04:22:39.9116939Z test_basic_nccl_ckpt_always (__main__.TensorPipePipeWithDDPTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:22:39.9732931Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 3362 2022-05-18T04:22:39.9834149Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 3363 2022-05-18T04:22:40.8941932Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp5enzujhf 2022-05-18T04:22:40.8945402Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp5enzujhf/_remote_module_non_scriptable.py 2022-05-18T04:22:40.9622538Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp48s5w43q 2022-05-18T04:22:40.9625037Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp48s5w43q/_remote_module_non_scriptable.py 2022-05-18T04:22:41.2599142Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:22:41.3258112Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:22:41.4884676Z skip: Need at least 4 CUDA devices (3.196s) 2022-05-18T04:22:41.4884937Z 2022-05-18T04:22:41.4885360Z ---------------------------------------------------------------------- 2022-05-18T04:22:41.4885709Z Ran 1 test in 3.196s 2022-05-18T04:22:41.4885876Z 2022-05-18T04:22:41.4885969Z OK (skipped=1) 2022-05-18T04:22:41.4886126Z 2022-05-18T04:22:41.4886251Z Generating XML reports... 2022-05-18T04:22:41.4927913Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipePipeWithDDPTest-20220518042238.xml 2022-05-18T04:22:42.6447339Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpbsqvf8js 2022-05-18T04:22:42.6448253Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpbsqvf8js/_remote_module_non_scriptable.py 2022-05-18T04:22:43.0097458Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:22:43.0112219Z 2022-05-18T04:22:43.0112367Z Running tests... 2022-05-18T04:22:43.0112813Z ---------------------------------------------------------------------- 2022-05-18T04:22:44.6119086Z test_basic_nccl_ckpt_except_last (__main__.TensorPipePipeWithDDPTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:22:44.6733326Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 3471 2022-05-18T04:22:44.6836309Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 3472 2022-05-18T04:22:45.5567550Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp_co6m8c1 2022-05-18T04:22:45.5568727Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp_co6m8c1/_remote_module_non_scriptable.py 2022-05-18T04:22:45.5596416Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp5g6hapcc 2022-05-18T04:22:45.5599090Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp5g6hapcc/_remote_module_non_scriptable.py 2022-05-18T04:22:45.9178352Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:22:45.9230582Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:22:46.0887666Z skip: Need at least 4 CUDA devices (3.077s) 2022-05-18T04:22:46.0888090Z 2022-05-18T04:22:46.0888702Z ---------------------------------------------------------------------- 2022-05-18T04:22:46.0889044Z Ran 1 test in 3.077s 2022-05-18T04:22:46.0889216Z 2022-05-18T04:22:46.0889332Z OK (skipped=1) 2022-05-18T04:22:46.0889499Z 2022-05-18T04:22:46.0889646Z Generating XML reports... 2022-05-18T04:22:46.0932725Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipePipeWithDDPTest-20220518042243.xml 2022-05-18T04:22:47.2523809Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpgehx58z3 2022-05-18T04:22:47.2525081Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpgehx58z3/_remote_module_non_scriptable.py 2022-05-18T04:22:47.6227219Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:22:47.6242459Z 2022-05-18T04:22:47.6242687Z Running tests... 2022-05-18T04:22:47.6243140Z ---------------------------------------------------------------------- 2022-05-18T04:22:49.2687065Z test_basic_nccl_ckpt_never (__main__.TensorPipePipeWithDDPTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:22:49.3317681Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 3580 2022-05-18T04:22:49.3420657Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 3581 2022-05-18T04:22:50.2919373Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp29dmvnjo 2022-05-18T04:22:50.2920449Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp29dmvnjo/_remote_module_non_scriptable.py 2022-05-18T04:22:50.3020416Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp0cxuhill 2022-05-18T04:22:50.3023139Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp0cxuhill/_remote_module_non_scriptable.py 2022-05-18T04:22:50.6439972Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:22:50.6766028Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:22:50.8470198Z skip: Need at least 4 CUDA devices (3.222s) 2022-05-18T04:22:50.8470469Z 2022-05-18T04:22:50.8470875Z ---------------------------------------------------------------------- 2022-05-18T04:22:50.8471226Z Ran 1 test in 3.223s 2022-05-18T04:22:50.8471395Z 2022-05-18T04:22:50.8471508Z OK (skipped=1) 2022-05-18T04:22:50.8473577Z 2022-05-18T04:22:50.8474007Z Generating XML reports... 2022-05-18T04:22:50.8515221Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipePipeWithDDPTest-20220518042247.xml 2022-05-18T04:22:51.9763803Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpgg49ltai 2022-05-18T04:22:51.9764815Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpgg49ltai/_remote_module_non_scriptable.py 2022-05-18T04:22:52.3495318Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:22:52.3510553Z 2022-05-18T04:22:52.3510846Z Running tests... 2022-05-18T04:22:52.3511295Z ---------------------------------------------------------------------- 2022-05-18T04:22:53.9979664Z test_basic_nccl_ckpt_never_find_unused (__main__.TensorPipePipeWithDDPTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:22:54.0625979Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 3689 2022-05-18T04:22:54.0728104Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 3690 2022-05-18T04:22:54.9917749Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpj1mhmj41 2022-05-18T04:22:54.9919086Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpj1mhmj41/_remote_module_non_scriptable.py 2022-05-18T04:22:55.0024811Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp_vtgcy_k 2022-05-18T04:22:55.0027537Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp_vtgcy_k/_remote_module_non_scriptable.py 2022-05-18T04:22:55.3633190Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:22:55.3666304Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:22:55.5777947Z skip: Need at least 4 CUDA devices (3.226s) 2022-05-18T04:22:55.5778353Z 2022-05-18T04:22:55.5779052Z ---------------------------------------------------------------------- 2022-05-18T04:22:55.5779754Z Ran 1 test in 3.227s 2022-05-18T04:22:55.5779955Z 2022-05-18T04:22:55.5780071Z OK (skipped=1) 2022-05-18T04:22:55.5780243Z 2022-05-18T04:22:55.5780354Z Generating XML reports... 2022-05-18T04:22:55.5823354Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipePipeWithDDPTest-20220518042252.xml 2022-05-18T04:22:56.7566668Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp3kq535z4 2022-05-18T04:22:56.7567744Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp3kq535z4/_remote_module_non_scriptable.py 2022-05-18T04:22:57.1266629Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:22:57.1281675Z 2022-05-18T04:22:57.1282166Z Running tests... 2022-05-18T04:22:57.1282643Z ---------------------------------------------------------------------- 2022-05-18T04:22:58.7748929Z test_async_execution_nested_with_cuda_future (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:22:58.8384605Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 3798 2022-05-18T04:22:58.8486909Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 3799 2022-05-18T04:22:58.8590669Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 3800 2022-05-18T04:22:58.8697780Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 3801 2022-05-18T04:22:59.7562020Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmprsc5e57s 2022-05-18T04:22:59.7562626Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmprsc5e57s/_remote_module_non_scriptable.py 2022-05-18T04:22:59.7732959Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpjs3f9in_ 2022-05-18T04:22:59.7735757Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpjs3f9in_/_remote_module_non_scriptable.py 2022-05-18T04:22:59.8018434Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpzuqu734p 2022-05-18T04:22:59.8021417Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpzuqu734p/_remote_module_non_scriptable.py 2022-05-18T04:22:59.8120957Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpxmvrj5tl 2022-05-18T04:22:59.8123972Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpxmvrj5tl/_remote_module_non_scriptable.py 2022-05-18T04:23:00.1124168Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:23:00.1248605Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:23:00.1728922Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:23:00.1808598Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:23:05.2867960Z ok (8.158s) 2022-05-18T04:23:05.2868188Z 2022-05-18T04:23:05.2868602Z ---------------------------------------------------------------------- 2022-05-18T04:23:05.2868935Z Ran 1 test in 8.159s 2022-05-18T04:23:05.2869103Z 2022-05-18T04:23:05.2869207Z OK 2022-05-18T04:23:05.2869344Z 2022-05-18T04:23:05.2869484Z Generating XML reports... 2022-05-18T04:23:05.2913562Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518042257.xml 2022-05-18T04:23:06.4575847Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpdrrinrnx 2022-05-18T04:23:06.4576877Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpdrrinrnx/_remote_module_non_scriptable.py 2022-05-18T04:23:06.8279566Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:23:06.8294907Z 2022-05-18T04:23:06.8295264Z Running tests... 2022-05-18T04:23:06.8295693Z ---------------------------------------------------------------------- 2022-05-18T04:23:08.4761571Z test_async_execution_with_cuda_future (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:23:08.5399508Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 4151 2022-05-18T04:23:08.5502362Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 4152 2022-05-18T04:23:08.5605746Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 4153 2022-05-18T04:23:08.5709966Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 4154 2022-05-18T04:23:09.4887872Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp6o2zkf5t 2022-05-18T04:23:09.4889116Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp6o2zkf5t/_remote_module_non_scriptable.py 2022-05-18T04:23:09.5374420Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpyl5ky_4j 2022-05-18T04:23:09.5376852Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpyl5ky_4j/_remote_module_non_scriptable.py 2022-05-18T04:23:09.5392406Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpn7_ju144 2022-05-18T04:23:09.5395277Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpn7_ju144/_remote_module_non_scriptable.py 2022-05-18T04:23:09.5540369Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpi_abe_nf 2022-05-18T04:23:09.5542819Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpi_abe_nf/_remote_module_non_scriptable.py 2022-05-18T04:23:09.8498460Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:23:09.8904689Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:23:09.9085796Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:23:09.9096348Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:23:17.1918416Z ok (10.362s) 2022-05-18T04:23:17.1918849Z 2022-05-18T04:23:17.1919262Z ---------------------------------------------------------------------- 2022-05-18T04:23:17.1919630Z Ran 1 test in 10.362s 2022-05-18T04:23:17.1919802Z 2022-05-18T04:23:17.1919904Z OK 2022-05-18T04:23:17.1920049Z 2022-05-18T04:23:17.1920185Z Generating XML reports... 2022-05-18T04:23:17.1964256Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518042306.xml 2022-05-18T04:23:18.3450005Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp67xehzfo 2022-05-18T04:23:18.3451342Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp67xehzfo/_remote_module_non_scriptable.py 2022-05-18T04:23:18.7026274Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:23:18.7040634Z 2022-05-18T04:23:18.7040898Z Running tests... 2022-05-18T04:23:18.7041505Z ---------------------------------------------------------------------- 2022-05-18T04:23:20.2901864Z test_cuda_future_callback_changes_devices (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:23:20.3528142Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 4504 2022-05-18T04:23:20.3630227Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 4505 2022-05-18T04:23:20.3734393Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 4506 2022-05-18T04:23:20.3837603Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 4507 2022-05-18T04:23:21.2581674Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpcv4k8mjx 2022-05-18T04:23:21.2582873Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpcv4k8mjx/_remote_module_non_scriptable.py 2022-05-18T04:23:21.2948572Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp5_bq45i3 2022-05-18T04:23:21.2950486Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp5_bq45i3/_remote_module_non_scriptable.py 2022-05-18T04:23:21.3016370Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp54uyankt 2022-05-18T04:23:21.3018507Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp54uyankt/_remote_module_non_scriptable.py 2022-05-18T04:23:21.3356241Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpkesoivzz 2022-05-18T04:23:21.3358130Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpkesoivzz/_remote_module_non_scriptable.py 2022-05-18T04:23:21.6190234Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:23:21.6501534Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:23:21.6606436Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:23:21.7002781Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:23:28.4031240Z ok (9.699s) 2022-05-18T04:23:28.4031488Z 2022-05-18T04:23:28.4031909Z ---------------------------------------------------------------------- 2022-05-18T04:23:28.4032286Z Ran 1 test in 9.699s 2022-05-18T04:23:28.4032458Z 2022-05-18T04:23:28.4032558Z OK 2022-05-18T04:23:28.4032678Z 2022-05-18T04:23:28.4032819Z Generating XML reports... 2022-05-18T04:23:28.4076421Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518042318.xml 2022-05-18T04:23:29.5784339Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpsr609eoc 2022-05-18T04:23:29.5785626Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpsr609eoc/_remote_module_non_scriptable.py 2022-05-18T04:23:29.9505165Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:23:29.9520614Z 2022-05-18T04:23:29.9520825Z Running tests... 2022-05-18T04:23:29.9521261Z ---------------------------------------------------------------------- 2022-05-18T04:23:31.5866378Z test_cuda_future_can_extract_cuda_sparse_tensor (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:23:31.6520285Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 4693 2022-05-18T04:23:31.6623467Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 4694 2022-05-18T04:23:31.6726765Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 4695 2022-05-18T04:23:31.6831746Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 4696 2022-05-18T04:23:32.5500140Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpxgbejx0i 2022-05-18T04:23:32.5501005Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpxgbejx0i/_remote_module_non_scriptable.py 2022-05-18T04:23:32.5548736Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpespx4f0d 2022-05-18T04:23:32.5551351Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpespx4f0d/_remote_module_non_scriptable.py 2022-05-18T04:23:32.5584764Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp_stygvf0 2022-05-18T04:23:32.5587747Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp_stygvf0/_remote_module_non_scriptable.py 2022-05-18T04:23:32.5771549Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpbgcnm46b 2022-05-18T04:23:32.5774532Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpbgcnm46b/_remote_module_non_scriptable.py 2022-05-18T04:23:32.9051471Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:23:32.9095887Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:23:32.9175621Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:23:32.9433944Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:23:38.6003536Z ok (8.648s) 2022-05-18T04:23:38.6004150Z 2022-05-18T04:23:38.6004584Z ---------------------------------------------------------------------- 2022-05-18T04:23:38.6004959Z Ran 1 test in 8.648s 2022-05-18T04:23:38.6005150Z 2022-05-18T04:23:38.6005250Z OK 2022-05-18T04:23:38.6006568Z 2022-05-18T04:23:38.6006951Z Generating XML reports... 2022-05-18T04:23:38.6050593Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518042329.xml 2022-05-18T04:23:39.7550595Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp43m262yi 2022-05-18T04:23:39.7551596Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp43m262yi/_remote_module_non_scriptable.py 2022-05-18T04:23:40.1204866Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:23:40.1219862Z 2022-05-18T04:23:40.1220266Z Running tests... 2022-05-18T04:23:40.1220785Z ---------------------------------------------------------------------- 2022-05-18T04:23:41.7320718Z test_cuda_future_can_extract_cuda_tensor (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:23:41.7952455Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 4938 2022-05-18T04:23:41.8054531Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 4939 2022-05-18T04:23:41.8160032Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 4940 2022-05-18T04:23:41.8266033Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 4941 2022-05-18T04:23:42.6958509Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp6bc41l_w 2022-05-18T04:23:42.6959160Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp6bc41l_w/_remote_module_non_scriptable.py 2022-05-18T04:23:42.7260099Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp2fvq0exq 2022-05-18T04:23:42.7262974Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp2fvq0exq/_remote_module_non_scriptable.py 2022-05-18T04:23:42.7721718Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpwrwyqn83 2022-05-18T04:23:42.7724104Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpwrwyqn83/_remote_module_non_scriptable.py 2022-05-18T04:23:42.7725193Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpzxqcts78 2022-05-18T04:23:42.7728087Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpzxqcts78/_remote_module_non_scriptable.py 2022-05-18T04:23:43.0515746Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:23:43.0947670Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:23:43.1309364Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:23:43.1405307Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:23:48.5430172Z ok (8.421s) 2022-05-18T04:23:48.5430410Z 2022-05-18T04:23:48.5430815Z ---------------------------------------------------------------------- 2022-05-18T04:23:48.5431167Z Ran 1 test in 8.421s 2022-05-18T04:23:48.5431336Z 2022-05-18T04:23:48.5431422Z OK 2022-05-18T04:23:48.5431558Z 2022-05-18T04:23:48.5431714Z Generating XML reports... 2022-05-18T04:23:48.5475674Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518042340.xml 2022-05-18T04:23:49.7022219Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp0m6rrpqz 2022-05-18T04:23:49.7023070Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp0m6rrpqz/_remote_module_non_scriptable.py 2022-05-18T04:23:50.0588200Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:23:50.0602616Z 2022-05-18T04:23:50.0602863Z Running tests... 2022-05-18T04:23:50.0603318Z ---------------------------------------------------------------------- 2022-05-18T04:23:51.6920808Z test_cuda_future_can_extract_custom_class_with_cuda_sparse_tensor (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:23:51.7538509Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 5123 2022-05-18T04:23:51.7639999Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 5124 2022-05-18T04:23:51.7745038Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 5125 2022-05-18T04:23:51.7849734Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 5126 2022-05-18T04:23:52.6864336Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpueog_q75 2022-05-18T04:23:52.6865423Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpueog_q75/_remote_module_non_scriptable.py 2022-05-18T04:23:52.6885338Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpi0pxkoon 2022-05-18T04:23:52.6887975Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpi0pxkoon/_remote_module_non_scriptable.py 2022-05-18T04:23:52.7124119Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpzqnzwms0 2022-05-18T04:23:52.7127478Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpzqnzwms0/_remote_module_non_scriptable.py 2022-05-18T04:23:52.7183291Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp4n0665ti 2022-05-18T04:23:52.7186083Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp4n0665ti/_remote_module_non_scriptable.py 2022-05-18T04:23:53.0431903Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:23:53.0545786Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:23:53.0684169Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:23:53.0748228Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:23:58.7022817Z ok (8.642s) 2022-05-18T04:23:58.7023056Z 2022-05-18T04:23:58.7023473Z ---------------------------------------------------------------------- 2022-05-18T04:23:58.7023843Z Ran 1 test in 8.642s 2022-05-18T04:23:58.7024019Z 2022-05-18T04:23:58.7024116Z OK 2022-05-18T04:23:58.7024234Z 2022-05-18T04:23:58.7024372Z Generating XML reports... 2022-05-18T04:23:58.7066890Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518042350.xml 2022-05-18T04:23:59.8709760Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpwuh22tir 2022-05-18T04:23:59.8710730Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpwuh22tir/_remote_module_non_scriptable.py 2022-05-18T04:24:00.2435090Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:24:00.2451145Z 2022-05-18T04:24:00.2451579Z Running tests... 2022-05-18T04:24:00.2452004Z ---------------------------------------------------------------------- 2022-05-18T04:24:01.8984383Z test_cuda_future_can_extract_custom_class_with_cuda_tensor (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:24:01.9692912Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 5372 2022-05-18T04:24:01.9797772Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 5373 2022-05-18T04:24:01.9907564Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 5374 2022-05-18T04:24:02.0014203Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 5375 2022-05-18T04:24:02.8967655Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpkehhna3b 2022-05-18T04:24:02.8968960Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpkehhna3b/_remote_module_non_scriptable.py 2022-05-18T04:24:02.8978543Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpbz6i7h64 2022-05-18T04:24:02.8981185Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpbz6i7h64/_remote_module_non_scriptable.py 2022-05-18T04:24:02.9368104Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmps5pljyag 2022-05-18T04:24:02.9370665Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmps5pljyag/_remote_module_non_scriptable.py 2022-05-18T04:24:02.9449546Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpgd1qyrfg 2022-05-18T04:24:02.9452136Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpgd1qyrfg/_remote_module_non_scriptable.py 2022-05-18T04:24:03.2555554Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:24:03.2708058Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:24:03.2954318Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:24:03.2964657Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:24:08.9185542Z ok (8.673s) 2022-05-18T04:24:08.9185995Z 2022-05-18T04:24:08.9186792Z ---------------------------------------------------------------------- 2022-05-18T04:24:08.9187227Z Ran 1 test in 8.673s 2022-05-18T04:24:08.9187380Z 2022-05-18T04:24:08.9187478Z OK 2022-05-18T04:24:08.9187616Z 2022-05-18T04:24:08.9187756Z Generating XML reports... 2022-05-18T04:24:08.9231156Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518042400.xml 2022-05-18T04:24:10.0840804Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpd3rqh5bi 2022-05-18T04:24:10.0842067Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpd3rqh5bi/_remote_module_non_scriptable.py 2022-05-18T04:24:10.4392981Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:24:10.4406943Z 2022-05-18T04:24:10.4407164Z Running tests... 2022-05-18T04:24:10.4408030Z ---------------------------------------------------------------------- 2022-05-18T04:24:12.0600474Z test_cuda_future_can_extract_list_with_cuda_sparse_tensor (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:24:12.1225152Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 5561 2022-05-18T04:24:12.1328235Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 5562 2022-05-18T04:24:12.1439406Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 5563 2022-05-18T04:24:12.1548746Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 5564 2022-05-18T04:24:13.0209064Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpq6_5hmb8 2022-05-18T04:24:13.0210137Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpq6_5hmb8/_remote_module_non_scriptable.py 2022-05-18T04:24:13.0475857Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmppvlu1ccs 2022-05-18T04:24:13.0478491Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmppvlu1ccs/_remote_module_non_scriptable.py 2022-05-18T04:24:13.0835055Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpgrv9d82s 2022-05-18T04:24:13.0837773Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpgrv9d82s/_remote_module_non_scriptable.py 2022-05-18T04:24:13.0989428Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp4jd9y10_ 2022-05-18T04:24:13.0991690Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp4jd9y10_/_remote_module_non_scriptable.py 2022-05-18T04:24:13.3739176Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:24:13.4172157Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:24:13.4446009Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:24:13.4608055Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:24:19.0732185Z ok (8.632s) 2022-05-18T04:24:19.0733795Z 2022-05-18T04:24:19.0734555Z ---------------------------------------------------------------------- 2022-05-18T04:24:19.0734942Z Ran 1 test in 8.632s 2022-05-18T04:24:19.0735090Z 2022-05-18T04:24:19.0735189Z OK 2022-05-18T04:24:19.0735328Z 2022-05-18T04:24:19.0735468Z Generating XML reports... 2022-05-18T04:24:19.0777379Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518042410.xml 2022-05-18T04:24:20.2313670Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp1432wldc 2022-05-18T04:24:20.2315024Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp1432wldc/_remote_module_non_scriptable.py 2022-05-18T04:24:20.5905930Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:24:20.5920509Z 2022-05-18T04:24:20.5920680Z Running tests... 2022-05-18T04:24:20.5921111Z ---------------------------------------------------------------------- 2022-05-18T04:24:22.1854126Z test_cuda_future_can_extract_list_with_cuda_tensor (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:24:22.2485408Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 5806 2022-05-18T04:24:22.2590917Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 5807 2022-05-18T04:24:22.2694614Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 5808 2022-05-18T04:24:22.2799826Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 5809 2022-05-18T04:24:23.1462554Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpg52aiuzv 2022-05-18T04:24:23.1463719Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpg52aiuzv/_remote_module_non_scriptable.py 2022-05-18T04:24:23.1496749Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpdykk5jfj 2022-05-18T04:24:23.1499767Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpdykk5jfj/_remote_module_non_scriptable.py 2022-05-18T04:24:23.1551296Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp7xgnnsex 2022-05-18T04:24:23.1553477Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp7xgnnsex/_remote_module_non_scriptable.py 2022-05-18T04:24:23.1853340Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpfjpyjupr 2022-05-18T04:24:23.1856356Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpfjpyjupr/_remote_module_non_scriptable.py 2022-05-18T04:24:23.5011443Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:24:23.5116715Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:24:23.5132965Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:24:23.5407805Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:24:29.1967209Z ok (8.604s) 2022-05-18T04:24:29.1967422Z 2022-05-18T04:24:29.1967826Z ---------------------------------------------------------------------- 2022-05-18T04:24:29.1968174Z Ran 1 test in 8.605s 2022-05-18T04:24:29.1968341Z 2022-05-18T04:24:29.1968440Z OK 2022-05-18T04:24:29.1970720Z 2022-05-18T04:24:29.1971109Z Generating XML reports... 2022-05-18T04:24:29.2011598Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518042420.xml 2022-05-18T04:24:30.3525763Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpgqcgzjit 2022-05-18T04:24:30.3526374Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpgqcgzjit/_remote_module_non_scriptable.py 2022-05-18T04:24:30.7161325Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:24:30.7175843Z 2022-05-18T04:24:30.7176357Z Running tests... 2022-05-18T04:24:30.7176867Z ---------------------------------------------------------------------- 2022-05-18T04:24:32.3186028Z test_cuda_future_device_as_device (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:24:32.3821941Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 5991 2022-05-18T04:24:32.3923822Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 5992 2022-05-18T04:24:32.4029744Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 5993 2022-05-18T04:24:32.4137143Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 5994 2022-05-18T04:24:33.3442361Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp65urlvb7 2022-05-18T04:24:33.3443256Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp65urlvb7/_remote_module_non_scriptable.py 2022-05-18T04:24:33.3662547Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmprqs4xll3 2022-05-18T04:24:33.3665147Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmprqs4xll3/_remote_module_non_scriptable.py 2022-05-18T04:24:33.3734612Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpnwy3hedx 2022-05-18T04:24:33.3737501Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpnwy3hedx/_remote_module_non_scriptable.py 2022-05-18T04:24:33.3934368Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmph4bzki2u 2022-05-18T04:24:33.3937273Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmph4bzki2u/_remote_module_non_scriptable.py 2022-05-18T04:24:33.6962637Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:24:33.7248372Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:24:33.7334942Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:24:33.7655460Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:24:33.9207397Z ok (3.201s) 2022-05-18T04:24:33.9207652Z 2022-05-18T04:24:33.9208082Z ---------------------------------------------------------------------- 2022-05-18T04:24:33.9208441Z Ran 1 test in 3.202s 2022-05-18T04:24:33.9208619Z 2022-05-18T04:24:33.9208717Z OK 2022-05-18T04:24:33.9208860Z 2022-05-18T04:24:33.9209001Z Generating XML reports... 2022-05-18T04:24:33.9238685Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518042430.xml 2022-05-18T04:24:35.0984094Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpt511p_9c 2022-05-18T04:24:35.0985582Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpt511p_9c/_remote_module_non_scriptable.py 2022-05-18T04:24:35.4685727Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:24:35.4701846Z 2022-05-18T04:24:35.4702163Z Running tests... 2022-05-18T04:24:35.4702604Z ---------------------------------------------------------------------- 2022-05-18T04:24:37.1117799Z test_cuda_future_device_as_int (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:24:37.1747406Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 6172 2022-05-18T04:24:37.1851458Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 6173 2022-05-18T04:24:37.1955522Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 6174 2022-05-18T04:24:37.2062667Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 6175 2022-05-18T04:24:38.0811445Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpypdsiem4 2022-05-18T04:24:38.0812067Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpypdsiem4/_remote_module_non_scriptable.py 2022-05-18T04:24:38.0850348Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp3tdjwryt 2022-05-18T04:24:38.0853516Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp3tdjwryt/_remote_module_non_scriptable.py 2022-05-18T04:24:38.0885754Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp5uvoyup3 2022-05-18T04:24:38.0888778Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp5uvoyup3/_remote_module_non_scriptable.py 2022-05-18T04:24:38.1239427Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpvscvsti1 2022-05-18T04:24:38.1241759Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpvscvsti1/_remote_module_non_scriptable.py 2022-05-18T04:24:38.4385422Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:24:38.4387098Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:24:38.4566361Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:24:38.4848439Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:24:38.7121181Z ok (3.242s) 2022-05-18T04:24:38.7121388Z 2022-05-18T04:24:38.7121801Z ---------------------------------------------------------------------- 2022-05-18T04:24:38.7122162Z Ran 1 test in 3.242s 2022-05-18T04:24:38.7122336Z 2022-05-18T04:24:38.7122440Z OK 2022-05-18T04:24:38.7122586Z 2022-05-18T04:24:38.7122725Z Generating XML reports... 2022-05-18T04:24:38.7169582Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518042435.xml 2022-05-18T04:24:39.8762202Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp8iaskisk 2022-05-18T04:24:39.8763675Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp8iaskisk/_remote_module_non_scriptable.py 2022-05-18T04:24:40.2376677Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:24:40.2391069Z 2022-05-18T04:24:40.2391357Z Running tests... 2022-05-18T04:24:40.2391851Z ---------------------------------------------------------------------- 2022-05-18T04:24:41.8631137Z test_cuda_future_device_as_str (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:24:41.9262759Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 6353 2022-05-18T04:24:41.9365139Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 6354 2022-05-18T04:24:41.9469582Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 6355 2022-05-18T04:24:41.9575530Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 6356 2022-05-18T04:24:42.8227343Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp7vi3tcp4 2022-05-18T04:24:42.8228498Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp7vi3tcp4/_remote_module_non_scriptable.py 2022-05-18T04:24:42.8287529Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmph76c86sm 2022-05-18T04:24:42.8290472Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmph76c86sm/_remote_module_non_scriptable.py 2022-05-18T04:24:42.8568449Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp1m20e6_3 2022-05-18T04:24:42.8571030Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp1m20e6_3/_remote_module_non_scriptable.py 2022-05-18T04:24:42.8662352Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpghiip8l9 2022-05-18T04:24:42.8664936Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpghiip8l9/_remote_module_non_scriptable.py 2022-05-18T04:24:43.1810962Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:24:43.1884997Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:24:43.2135902Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:24:43.2228294Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:24:43.4632708Z ok (3.224s) 2022-05-18T04:24:43.4632984Z 2022-05-18T04:24:43.4633410Z ---------------------------------------------------------------------- 2022-05-18T04:24:43.4633782Z Ran 1 test in 3.224s 2022-05-18T04:24:43.4633961Z 2022-05-18T04:24:43.4634062Z OK 2022-05-18T04:24:43.4634180Z 2022-05-18T04:24:43.4634321Z Generating XML reports... 2022-05-18T04:24:43.4679946Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518042440.xml 2022-05-18T04:24:44.6420514Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp4685aseu 2022-05-18T04:24:44.6421975Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp4685aseu/_remote_module_non_scriptable.py 2022-05-18T04:24:45.0162357Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:24:45.0178936Z 2022-05-18T04:24:45.0179316Z Running tests... 2022-05-18T04:24:45.0179837Z ---------------------------------------------------------------------- 2022-05-18T04:24:46.6829961Z test_cuda_future_device_not_cuda (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:24:46.7473251Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 6534 2022-05-18T04:24:46.7577537Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 6535 2022-05-18T04:24:46.7684538Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 6536 2022-05-18T04:24:46.7791853Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 6537 2022-05-18T04:24:47.6568956Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpqqj0alq0 2022-05-18T04:24:47.6569902Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpqqj0alq0/_remote_module_non_scriptable.py 2022-05-18T04:24:47.6783941Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp1yn68kaz 2022-05-18T04:24:47.6786666Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp1yn68kaz/_remote_module_non_scriptable.py 2022-05-18T04:24:47.6845640Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp48wxv1b9 2022-05-18T04:24:47.6848657Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp48wxv1b9/_remote_module_non_scriptable.py 2022-05-18T04:24:47.6958481Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpjl0w_pnn 2022-05-18T04:24:47.6961843Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpjl0w_pnn/_remote_module_non_scriptable.py 2022-05-18T04:24:48.0101432Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:24:48.0373533Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:24:48.0473581Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:24:48.0622640Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:24:48.2849531Z ok (3.267s) 2022-05-18T04:24:48.2849778Z 2022-05-18T04:24:48.2850167Z ---------------------------------------------------------------------- 2022-05-18T04:24:48.2850746Z Ran 1 test in 3.267s 2022-05-18T04:24:48.2850917Z 2022-05-18T04:24:48.2851016Z OK 2022-05-18T04:24:48.2851157Z 2022-05-18T04:24:48.2851293Z Generating XML reports... 2022-05-18T04:24:48.2894762Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518042445.xml 2022-05-18T04:24:49.4428862Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpm8p1wc9p 2022-05-18T04:24:49.4429713Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpm8p1wc9p/_remote_module_non_scriptable.py 2022-05-18T04:24:49.7990215Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:24:49.8004661Z 2022-05-18T04:24:49.8004963Z Running tests... 2022-05-18T04:24:49.8005416Z ---------------------------------------------------------------------- 2022-05-18T04:24:51.4009434Z test_cuda_future_modify_tensor_inplace (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:24:51.4630732Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 6715 2022-05-18T04:24:51.4733946Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 6716 2022-05-18T04:24:51.4839037Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 6717 2022-05-18T04:24:51.4945630Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 6718 2022-05-18T04:24:52.4205922Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp_14dw1um 2022-05-18T04:24:52.4206589Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp_14dw1um/_remote_module_non_scriptable.py 2022-05-18T04:24:52.4271489Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp625dyenf 2022-05-18T04:24:52.4274121Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp625dyenf/_remote_module_non_scriptable.py 2022-05-18T04:24:52.4297030Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpe_6_2el9 2022-05-18T04:24:52.4299891Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpe_6_2el9/_remote_module_non_scriptable.py 2022-05-18T04:24:52.4369372Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpnalztj6_ 2022-05-18T04:24:52.4372403Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpnalztj6_/_remote_module_non_scriptable.py 2022-05-18T04:24:52.7742341Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:24:52.7865562Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:24:52.7931151Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:24:52.8027153Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:24:54.6040972Z ok (4.803s) 2022-05-18T04:24:54.6045742Z 2022-05-18T04:24:54.6046523Z ---------------------------------------------------------------------- 2022-05-18T04:24:54.6047164Z Ran 1 test in 4.804s 2022-05-18T04:24:54.6047475Z 2022-05-18T04:24:54.6047652Z OK 2022-05-18T04:24:54.6047911Z 2022-05-18T04:24:54.6048231Z Generating XML reports... 2022-05-18T04:24:54.6092860Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518042449.xml 2022-05-18T04:24:55.7790979Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp1kwgplfs 2022-05-18T04:24:55.7791852Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp1kwgplfs/_remote_module_non_scriptable.py 2022-05-18T04:24:56.1494871Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:24:56.1510613Z 2022-05-18T04:24:56.1511325Z Running tests... 2022-05-18T04:24:56.1512064Z ---------------------------------------------------------------------- 2022-05-18T04:24:57.7817170Z test_cuda_future_replace_tensor (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:24:57.8460767Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 6900 2022-05-18T04:24:57.8563488Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 6901 2022-05-18T04:24:57.8671630Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 6902 2022-05-18T04:24:57.8777636Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 6903 2022-05-18T04:24:58.7728527Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpugs3qiqi 2022-05-18T04:24:58.7729790Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpugs3qiqi/_remote_module_non_scriptable.py 2022-05-18T04:24:58.8018599Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp3pig4v4c 2022-05-18T04:24:58.8020917Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp3pig4v4c/_remote_module_non_scriptable.py 2022-05-18T04:24:58.8043188Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpjiaafj8r 2022-05-18T04:24:58.8046038Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpjiaafj8r/_remote_module_non_scriptable.py 2022-05-18T04:24:58.8253726Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp6kaudhy5 2022-05-18T04:24:58.8256875Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp6kaudhy5/_remote_module_non_scriptable.py 2022-05-18T04:24:59.1292606Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:24:59.1623082Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:24:59.1646209Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:24:59.1864568Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:25:00.9873944Z ok (4.836s) 2022-05-18T04:25:00.9874359Z 2022-05-18T04:25:00.9875045Z ---------------------------------------------------------------------- 2022-05-18T04:25:00.9875662Z Ran 1 test in 4.836s 2022-05-18T04:25:00.9875964Z 2022-05-18T04:25:00.9876117Z OK 2022-05-18T04:25:00.9876374Z 2022-05-18T04:25:00.9876622Z Generating XML reports... 2022-05-18T04:25:00.9921211Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518042456.xml 2022-05-18T04:25:02.1744301Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpwj_zed5v 2022-05-18T04:25:02.1745176Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpwj_zed5v/_remote_module_non_scriptable.py 2022-05-18T04:25:02.5524915Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:25:02.5540736Z 2022-05-18T04:25:02.5540971Z Running tests... 2022-05-18T04:25:02.5541401Z ---------------------------------------------------------------------- 2022-05-18T04:25:04.2325737Z test_cuda_future_value_on_bad_device (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:25:04.2976393Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 7085 2022-05-18T04:25:04.3079694Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 7086 2022-05-18T04:25:04.3186461Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 7087 2022-05-18T04:25:04.3298569Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 7088 2022-05-18T04:25:05.3061052Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpvayja77o 2022-05-18T04:25:05.3062233Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpvayja77o/_remote_module_non_scriptable.py 2022-05-18T04:25:05.3202216Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpg8_3c1g9 2022-05-18T04:25:05.3204961Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpg8_3c1g9/_remote_module_non_scriptable.py 2022-05-18T04:25:05.3217595Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmppwr0tgyq 2022-05-18T04:25:05.3220193Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmppwr0tgyq/_remote_module_non_scriptable.py 2022-05-18T04:25:05.3561119Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpm_h2xoa6 2022-05-18T04:25:05.3563738Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpm_h2xoa6/_remote_module_non_scriptable.py 2022-05-18T04:25:05.6578660Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:25:05.6787466Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:25:05.6864170Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:25:05.7263072Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:25:12.7496899Z ok (10.195s) 2022-05-18T04:25:12.7497163Z 2022-05-18T04:25:12.7497563Z ---------------------------------------------------------------------- 2022-05-18T04:25:12.7497920Z Ran 1 test in 10.196s 2022-05-18T04:25:12.7498068Z 2022-05-18T04:25:12.7498169Z OK 2022-05-18T04:25:12.7498305Z 2022-05-18T04:25:12.7498442Z Generating XML reports... 2022-05-18T04:25:12.7540965Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518042502.xml 2022-05-18T04:25:13.9196454Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp78jcjypa 2022-05-18T04:25:13.9197652Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp78jcjypa/_remote_module_non_scriptable.py 2022-05-18T04:25:14.2915022Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:25:14.2929968Z 2022-05-18T04:25:14.2930216Z Running tests... 2022-05-18T04:25:14.2930667Z ---------------------------------------------------------------------- 2022-05-18T04:25:15.9311611Z test_custom_stream (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:25:15.9954413Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 7274 2022-05-18T04:25:16.0058505Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 7275 2022-05-18T04:25:16.0164602Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 7276 2022-05-18T04:25:16.0272534Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 7277 2022-05-18T04:25:16.9329543Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp8wzlr5eh 2022-05-18T04:25:16.9330550Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp8wzlr5eh/_remote_module_non_scriptable.py 2022-05-18T04:25:16.9497387Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpioqebcxn 2022-05-18T04:25:16.9500243Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpioqebcxn/_remote_module_non_scriptable.py 2022-05-18T04:25:16.9590083Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpqdmm_j35 2022-05-18T04:25:16.9592945Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpqdmm_j35/_remote_module_non_scriptable.py 2022-05-18T04:25:16.9977929Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpd9jen_60 2022-05-18T04:25:16.9980464Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpd9jen_60/_remote_module_non_scriptable.py 2022-05-18T04:25:17.2862672Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:25:17.3055824Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:25:17.3143451Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:25:17.3634306Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:25:25.1502221Z ok (10.857s) 2022-05-18T04:25:25.1502563Z 2022-05-18T04:25:25.1503188Z ---------------------------------------------------------------------- 2022-05-18T04:25:25.1503585Z Ran 1 test in 10.857s 2022-05-18T04:25:25.1503884Z 2022-05-18T04:25:25.1504070Z OK 2022-05-18T04:25:25.1504363Z 2022-05-18T04:25:25.1504748Z Generating XML reports... 2022-05-18T04:25:25.1549352Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518042514.xml 2022-05-18T04:25:26.3242538Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpim6mgp0m 2022-05-18T04:25:26.3243524Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpim6mgp0m/_remote_module_non_scriptable.py 2022-05-18T04:25:26.6875587Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:25:26.6890448Z 2022-05-18T04:25:26.6891020Z Running tests... 2022-05-18T04:25:26.6891513Z ---------------------------------------------------------------------- 2022-05-18T04:25:28.3077732Z test_custom_stream_multi (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:25:28.3700357Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 7639 2022-05-18T04:25:28.3802726Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 7640 2022-05-18T04:25:28.3907115Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 7641 2022-05-18T04:25:28.4013773Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 7642 2022-05-18T04:25:29.3138232Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpiulabau8 2022-05-18T04:25:29.3139683Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpiulabau8/_remote_module_non_scriptable.py 2022-05-18T04:25:29.3783062Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpixwi4etz 2022-05-18T04:25:29.3785476Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpixwi4etz/_remote_module_non_scriptable.py 2022-05-18T04:25:29.3900629Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpwfymeixq 2022-05-18T04:25:29.3903431Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpwfymeixq/_remote_module_non_scriptable.py 2022-05-18T04:25:29.4291949Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpbl5kg6on 2022-05-18T04:25:29.4295070Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpbl5kg6on/_remote_module_non_scriptable.py 2022-05-18T04:25:29.6741352Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:25:29.7459236Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:25:29.7522265Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:25:29.7926334Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:25:42.6398877Z ok (15.950s) 2022-05-18T04:25:42.6399163Z 2022-05-18T04:25:42.6399841Z ---------------------------------------------------------------------- 2022-05-18T04:25:42.6400425Z Ran 1 test in 15.951s 2022-05-18T04:25:42.6400705Z 2022-05-18T04:25:42.6400842Z OK 2022-05-18T04:25:42.6401103Z 2022-05-18T04:25:42.6401327Z Generating XML reports... 2022-05-18T04:25:42.6444790Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518042526.xml 2022-05-18T04:25:43.8095396Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpqt8woyea 2022-05-18T04:25:43.8096311Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpqt8woyea/_remote_module_non_scriptable.py 2022-05-18T04:25:44.1681889Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:25:44.1696876Z 2022-05-18T04:25:44.1697297Z Running tests... 2022-05-18T04:25:44.1697838Z ---------------------------------------------------------------------- 2022-05-18T04:25:45.7787443Z test_custom_stream_nested (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:25:45.8416809Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 8004 2022-05-18T04:25:45.8518126Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 8005 2022-05-18T04:25:45.8621809Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 8006 2022-05-18T04:25:45.8726238Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 8007 2022-05-18T04:25:46.7831455Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpcsjgzc0r 2022-05-18T04:25:46.7833565Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpcsjgzc0r/_remote_module_non_scriptable.py 2022-05-18T04:25:46.7847899Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpt9hy8tio 2022-05-18T04:25:46.7850825Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpt9hy8tio/_remote_module_non_scriptable.py 2022-05-18T04:25:46.8210111Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpua4zlqyt 2022-05-18T04:25:46.8212923Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpua4zlqyt/_remote_module_non_scriptable.py 2022-05-18T04:25:46.8585674Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpj923832n 2022-05-18T04:25:46.8588521Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpj923832n/_remote_module_non_scriptable.py 2022-05-18T04:25:47.1406128Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:25:47.1600443Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:25:47.1819732Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:25:47.2175573Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:25:56.1974095Z ok (12.027s) 2022-05-18T04:25:56.1974350Z 2022-05-18T04:25:56.1974756Z ---------------------------------------------------------------------- 2022-05-18T04:25:56.1975107Z Ran 1 test in 12.028s 2022-05-18T04:25:56.1975274Z 2022-05-18T04:25:56.1975376Z OK 2022-05-18T04:25:56.1975532Z 2022-05-18T04:25:56.1975654Z Generating XML reports... 2022-05-18T04:25:56.2018244Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518042544.xml 2022-05-18T04:25:57.3535927Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp3fghcedr 2022-05-18T04:25:57.3536946Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp3fghcedr/_remote_module_non_scriptable.py 2022-05-18T04:25:57.7096589Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:25:57.7110933Z 2022-05-18T04:25:57.7111353Z Running tests... 2022-05-18T04:25:57.7111843Z ---------------------------------------------------------------------- 2022-05-18T04:25:59.2989207Z test_custom_stream_nested_multi (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:25:59.3624314Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 8369 2022-05-18T04:25:59.3728282Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 8370 2022-05-18T04:25:59.3829861Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 8371 2022-05-18T04:25:59.3933292Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 8372 2022-05-18T04:26:00.3773628Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmptnv0wrk6 2022-05-18T04:26:00.3774911Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmptnv0wrk6/_remote_module_non_scriptable.py 2022-05-18T04:26:00.3792528Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpp445390e 2022-05-18T04:26:00.3795081Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpp445390e/_remote_module_non_scriptable.py 2022-05-18T04:26:00.4135871Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpjgskmqzh 2022-05-18T04:26:00.4138629Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpjgskmqzh/_remote_module_non_scriptable.py 2022-05-18T04:26:00.4246771Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpu39na_k6 2022-05-18T04:26:00.4249514Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpu39na_k6/_remote_module_non_scriptable.py 2022-05-18T04:26:00.7327223Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:26:00.7431943Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:26:00.7729642Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:26:00.7959620Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:26:08.1187227Z ok (10.407s) 2022-05-18T04:26:08.1187450Z 2022-05-18T04:26:08.1187888Z ---------------------------------------------------------------------- 2022-05-18T04:26:08.1188261Z Ran 1 test in 10.408s 2022-05-18T04:26:08.1188411Z 2022-05-18T04:26:08.1188515Z OK 2022-05-18T04:26:08.1190254Z 2022-05-18T04:26:08.1190615Z Generating XML reports... 2022-05-18T04:26:08.1235626Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518042557.xml 2022-05-18T04:26:09.2923256Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp_75nv3g5 2022-05-18T04:26:09.2924232Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp_75nv3g5/_remote_module_non_scriptable.py 2022-05-18T04:26:09.6657300Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:26:09.6672699Z 2022-05-18T04:26:09.6673148Z Running tests... 2022-05-18T04:26:09.6673644Z ---------------------------------------------------------------------- 2022-05-18T04:26:11.3105525Z test_device_map_cpu (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:26:11.3748260Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 8729 2022-05-18T04:26:11.3851723Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 8730 2022-05-18T04:26:11.3956325Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 8731 2022-05-18T04:26:11.4062098Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 8732 2022-05-18T04:26:12.3373553Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpqiex3me5 2022-05-18T04:26:12.3374520Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpqiex3me5/_remote_module_non_scriptable.py 2022-05-18T04:26:12.3625165Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpjffvytxl 2022-05-18T04:26:12.3627764Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpjffvytxl/_remote_module_non_scriptable.py 2022-05-18T04:26:12.3676705Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmptrbumooo 2022-05-18T04:26:12.3679524Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmptrbumooo/_remote_module_non_scriptable.py 2022-05-18T04:26:12.3852214Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp2vzlq9fz 2022-05-18T04:26:12.3855238Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp2vzlq9fz/_remote_module_non_scriptable.py 2022-05-18T04:26:12.7061066Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:26:12.7224503Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:26:12.7262521Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:26:12.7533996Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:26:13.4130107Z ok (3.745s) 2022-05-18T04:26:13.4130336Z 2022-05-18T04:26:13.4130771Z ---------------------------------------------------------------------- 2022-05-18T04:26:13.4131326Z Ran 1 test in 3.746s 2022-05-18T04:26:13.4131496Z 2022-05-18T04:26:13.4131575Z OK 2022-05-18T04:26:13.4131710Z 2022-05-18T04:26:13.4131848Z Generating XML reports... 2022-05-18T04:26:13.4174557Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518042609.xml 2022-05-18T04:26:14.5828701Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpeen3ck6m 2022-05-18T04:26:14.5829685Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpeen3ck6m/_remote_module_non_scriptable.py 2022-05-18T04:26:14.9612974Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:26:14.9628313Z 2022-05-18T04:26:14.9628556Z Running tests... 2022-05-18T04:26:14.9628999Z ---------------------------------------------------------------------- 2022-05-18T04:26:16.6217923Z test_device_map_cpu_to_gpu_default (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:26:16.6859257Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 9078 2022-05-18T04:26:16.6961647Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 9079 2022-05-18T04:26:16.7066960Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 9080 2022-05-18T04:26:16.7173375Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 9081 2022-05-18T04:26:17.6052366Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpzhwva0ep 2022-05-18T04:26:17.6053573Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpzhwva0ep/_remote_module_non_scriptable.py 2022-05-18T04:26:17.6274221Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpd6bm28xq 2022-05-18T04:26:17.6276853Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpd6bm28xq/_remote_module_non_scriptable.py 2022-05-18T04:26:17.6401424Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpvq8_92vv 2022-05-18T04:26:17.6404147Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpvq8_92vv/_remote_module_non_scriptable.py 2022-05-18T04:26:17.6597620Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpckagl4s7 2022-05-18T04:26:17.6600356Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpckagl4s7/_remote_module_non_scriptable.py 2022-05-18T04:26:17.9623936Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:26:18.0018675Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:26:18.0029529Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:26:18.0095013Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:26:21.4304725Z ok (6.467s) 2022-05-18T04:26:21.4305006Z 2022-05-18T04:26:21.4305427Z ---------------------------------------------------------------------- 2022-05-18T04:26:21.4305779Z Ran 1 test in 6.468s 2022-05-18T04:26:21.4305959Z 2022-05-18T04:26:21.4306058Z OK 2022-05-18T04:26:21.4306876Z 2022-05-18T04:26:21.4307045Z Generating XML reports... 2022-05-18T04:26:21.4350599Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518042614.xml 2022-05-18T04:26:22.5892811Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpq89eowqe 2022-05-18T04:26:22.5893957Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpq89eowqe/_remote_module_non_scriptable.py 2022-05-18T04:26:22.9466191Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:26:22.9480918Z 2022-05-18T04:26:22.9481156Z Running tests... 2022-05-18T04:26:22.9481594Z ---------------------------------------------------------------------- 2022-05-18T04:26:24.5654052Z test_device_map_cpu_to_gpu_non_default (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:26:24.6287766Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 9435 2022-05-18T04:26:24.6392207Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 9436 2022-05-18T04:26:24.6499291Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 9437 2022-05-18T04:26:24.6604138Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 9438 2022-05-18T04:26:25.6373990Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpabjx2tbi 2022-05-18T04:26:25.6375125Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpabjx2tbi/_remote_module_non_scriptable.py 2022-05-18T04:26:25.6544486Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp1ost0dhc 2022-05-18T04:26:25.6547218Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp1ost0dhc/_remote_module_non_scriptable.py 2022-05-18T04:26:25.6644998Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp457ln2zj 2022-05-18T04:26:25.6647773Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp457ln2zj/_remote_module_non_scriptable.py 2022-05-18T04:26:25.6924446Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpyd49zthf 2022-05-18T04:26:25.6926920Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpyd49zthf/_remote_module_non_scriptable.py 2022-05-18T04:26:25.9930756Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:26:26.0187271Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:26:26.0326264Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:26:26.0478489Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:26:29.3740258Z ok (6.426s) 2022-05-18T04:26:29.3740562Z 2022-05-18T04:26:29.3740987Z ---------------------------------------------------------------------- 2022-05-18T04:26:29.3741338Z Ran 1 test in 6.426s 2022-05-18T04:26:29.3741505Z 2022-05-18T04:26:29.3741590Z OK 2022-05-18T04:26:29.3741731Z 2022-05-18T04:26:29.3741894Z Generating XML reports... 2022-05-18T04:26:29.3787761Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518042622.xml 2022-05-18T04:26:30.5388062Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpy28mrntx 2022-05-18T04:26:30.5389348Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpy28mrntx/_remote_module_non_scriptable.py 2022-05-18T04:26:30.9141441Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:26:30.9157289Z 2022-05-18T04:26:30.9157640Z Running tests... 2022-05-18T04:26:30.9158115Z ---------------------------------------------------------------------- 2022-05-18T04:26:32.5833051Z test_device_map_gpu_default (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:26:32.6477699Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 9792 2022-05-18T04:26:32.6581956Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 9793 2022-05-18T04:26:32.6687144Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 9794 2022-05-18T04:26:32.6793057Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 9795 2022-05-18T04:26:33.5699359Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpsymfhrre 2022-05-18T04:26:33.5700549Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpsymfhrre/_remote_module_non_scriptable.py 2022-05-18T04:26:33.5719909Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpdtktxn13 2022-05-18T04:26:33.5722582Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpdtktxn13/_remote_module_non_scriptable.py 2022-05-18T04:26:33.6047939Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp9d_0881l 2022-05-18T04:26:33.6049950Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp9d_0881l/_remote_module_non_scriptable.py 2022-05-18T04:26:33.6194110Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmps7va6eqy 2022-05-18T04:26:33.6196233Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmps7va6eqy/_remote_module_non_scriptable.py 2022-05-18T04:26:33.9286894Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:26:33.9401570Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:26:33.9703375Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:26:33.9795735Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:26:37.3935727Z ok (6.477s) 2022-05-18T04:26:37.3935966Z 2022-05-18T04:26:37.3936374Z ---------------------------------------------------------------------- 2022-05-18T04:26:37.3936738Z Ran 1 test in 6.478s 2022-05-18T04:26:37.3936908Z 2022-05-18T04:26:37.3936987Z OK 2022-05-18T04:26:37.3937126Z 2022-05-18T04:26:37.3937265Z Generating XML reports... 2022-05-18T04:26:37.3982114Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518042630.xml 2022-05-18T04:26:38.5502783Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpi74nda0i 2022-05-18T04:26:38.5503713Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpi74nda0i/_remote_module_non_scriptable.py 2022-05-18T04:26:38.9078884Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:26:38.9093709Z 2022-05-18T04:26:38.9094050Z Running tests... 2022-05-18T04:26:38.9094540Z ---------------------------------------------------------------------- 2022-05-18T04:26:40.5235237Z test_device_map_gpu_default_to_non_default (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:26:40.5859352Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 10145 2022-05-18T04:26:40.5960664Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 10146 2022-05-18T04:26:40.6066709Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 10147 2022-05-18T04:26:40.6171481Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 10148 2022-05-18T04:26:41.4909769Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp5qmpys71 2022-05-18T04:26:41.4911235Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp5qmpys71/_remote_module_non_scriptable.py 2022-05-18T04:26:41.5155899Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp0gopy3rz 2022-05-18T04:26:41.5157954Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp0gopy3rz/_remote_module_non_scriptable.py 2022-05-18T04:26:41.5294993Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmph0vl8g8q 2022-05-18T04:26:41.5297273Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmph0vl8g8q/_remote_module_non_scriptable.py 2022-05-18T04:26:41.5670865Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp8n8twsd1 2022-05-18T04:26:41.5673113Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp8n8twsd1/_remote_module_non_scriptable.py 2022-05-18T04:26:41.8465914Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:26:41.8694311Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:26:41.8877641Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:26:41.9446006Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:26:47.5376362Z ok (8.628s) 2022-05-18T04:26:47.5376590Z 2022-05-18T04:26:47.5377023Z ---------------------------------------------------------------------- 2022-05-18T04:26:47.5377375Z Ran 1 test in 8.628s 2022-05-18T04:26:47.5377549Z 2022-05-18T04:26:47.5377627Z OK 2022-05-18T04:26:47.5377769Z 2022-05-18T04:26:47.5377902Z Generating XML reports... 2022-05-18T04:26:47.5420563Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518042638.xml 2022-05-18T04:26:48.6933839Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpbmp_d1gg 2022-05-18T04:26:48.6935259Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpbmp_d1gg/_remote_module_non_scriptable.py 2022-05-18T04:26:49.0528781Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:26:49.0543310Z 2022-05-18T04:26:49.0543511Z Running tests... 2022-05-18T04:26:49.0544134Z ---------------------------------------------------------------------- 2022-05-18T04:26:50.6587360Z test_device_map_gpu_mixed_1 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:26:50.7225628Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 10510 2022-05-18T04:26:50.7326823Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 10511 2022-05-18T04:26:50.7433773Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 10512 2022-05-18T04:26:50.7542233Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 10513 2022-05-18T04:26:51.6724053Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpvjy8kz91 2022-05-18T04:26:51.6725018Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpvjy8kz91/_remote_module_non_scriptable.py 2022-05-18T04:26:51.7386320Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpfcw6lvj3 2022-05-18T04:26:51.7387908Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpfcw6lvj3/_remote_module_non_scriptable.py 2022-05-18T04:26:51.7474713Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmposl233x1 2022-05-18T04:26:51.7477372Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmposl233x1/_remote_module_non_scriptable.py 2022-05-18T04:26:51.7701211Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpkqp3lbp7 2022-05-18T04:26:51.7703679Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpkqp3lbp7/_remote_module_non_scriptable.py 2022-05-18T04:26:52.0320064Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:26:52.1041450Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:26:52.1228330Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:26:52.1339078Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:26:57.7728978Z ok (8.718s) 2022-05-18T04:26:57.7729364Z 2022-05-18T04:26:57.7730019Z ---------------------------------------------------------------------- 2022-05-18T04:26:57.7730947Z Ran 1 test in 8.718s 2022-05-18T04:26:57.7731248Z 2022-05-18T04:26:57.7731420Z OK 2022-05-18T04:26:57.7731655Z 2022-05-18T04:26:57.7731884Z Generating XML reports... 2022-05-18T04:26:57.7772749Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518042649.xml 2022-05-18T04:26:58.9451927Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp73h940f0 2022-05-18T04:26:58.9454270Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp73h940f0/_remote_module_non_scriptable.py 2022-05-18T04:26:59.3150595Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:26:59.3166149Z 2022-05-18T04:26:59.3166555Z Running tests... 2022-05-18T04:26:59.3167063Z ---------------------------------------------------------------------- 2022-05-18T04:27:00.9717017Z test_device_map_gpu_mixed_2 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:27:01.0359336Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 10867 2022-05-18T04:27:01.0464066Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 10868 2022-05-18T04:27:01.0572086Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 10869 2022-05-18T04:27:01.0679046Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 10870 2022-05-18T04:27:01.9368829Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpcwgi_mz_ 2022-05-18T04:27:01.9369441Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpcwgi_mz_/_remote_module_non_scriptable.py 2022-05-18T04:27:01.9784435Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmplaxnbi1k 2022-05-18T04:27:01.9785494Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmplaxnbi1k/_remote_module_non_scriptable.py 2022-05-18T04:27:02.0032108Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpnbxr1evm 2022-05-18T04:27:02.0034440Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpnbxr1evm/_remote_module_non_scriptable.py 2022-05-18T04:27:02.0581056Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpc22rig70 2022-05-18T04:27:02.0583836Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpc22rig70/_remote_module_non_scriptable.py 2022-05-18T04:27:02.2957103Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:27:02.3354859Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:27:02.3555432Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:27:02.4334877Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:27:07.9863308Z ok (8.669s) 2022-05-18T04:27:07.9863517Z 2022-05-18T04:27:07.9863925Z ---------------------------------------------------------------------- 2022-05-18T04:27:07.9864271Z Ran 1 test in 8.670s 2022-05-18T04:27:07.9864437Z 2022-05-18T04:27:07.9864533Z OK 2022-05-18T04:27:07.9865449Z 2022-05-18T04:27:07.9865577Z Generating XML reports... 2022-05-18T04:27:07.9909394Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518042659.xml 2022-05-18T04:27:09.1671647Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp9nyk5aer 2022-05-18T04:27:09.1673079Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp9nyk5aer/_remote_module_non_scriptable.py 2022-05-18T04:27:09.5378936Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:27:09.5394677Z 2022-05-18T04:27:09.5394955Z Running tests... 2022-05-18T04:27:09.5395381Z ---------------------------------------------------------------------- 2022-05-18T04:27:11.1953449Z test_device_map_gpu_mixed_3 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:27:11.2594474Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 11224 2022-05-18T04:27:11.2698800Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 11225 2022-05-18T04:27:11.2805034Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 11226 2022-05-18T04:27:11.2912800Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 11227 2022-05-18T04:27:12.1447427Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp0qrsqixd 2022-05-18T04:27:12.1448629Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp0qrsqixd/_remote_module_non_scriptable.py 2022-05-18T04:27:12.2139890Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpkhj68w65 2022-05-18T04:27:12.2141546Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpkhj68w65/_remote_module_non_scriptable.py 2022-05-18T04:27:12.2300244Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpvysj2068 2022-05-18T04:27:12.2303707Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpvysj2068/_remote_module_non_scriptable.py 2022-05-18T04:27:12.2304842Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp4cwo_jsr 2022-05-18T04:27:12.2305929Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp4cwo_jsr/_remote_module_non_scriptable.py 2022-05-18T04:27:12.4989817Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:27:12.5708161Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:27:12.5905271Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:27:12.6082311Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:27:18.2108734Z ok (8.671s) 2022-05-18T04:27:18.2108959Z 2022-05-18T04:27:18.2109371Z ---------------------------------------------------------------------- 2022-05-18T04:27:18.2109734Z Ran 1 test in 8.671s 2022-05-18T04:27:18.2109883Z 2022-05-18T04:27:18.2109979Z OK 2022-05-18T04:27:18.2110117Z 2022-05-18T04:27:18.2110274Z Generating XML reports... 2022-05-18T04:27:18.2154456Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518042709.xml 2022-05-18T04:27:19.3689740Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpvrrna_qd 2022-05-18T04:27:19.3690961Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpvrrna_qd/_remote_module_non_scriptable.py 2022-05-18T04:27:19.7265265Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:27:19.7280232Z 2022-05-18T04:27:19.7280686Z Running tests... 2022-05-18T04:27:19.7281184Z ---------------------------------------------------------------------- 2022-05-18T04:27:21.3340555Z test_device_map_gpu_mixed_4 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:27:21.3965106Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 11581 2022-05-18T04:27:21.4067876Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 11582 2022-05-18T04:27:21.4173841Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 11583 2022-05-18T04:27:21.4277718Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 11584 2022-05-18T04:27:22.3124363Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpl3wgflse 2022-05-18T04:27:22.3125083Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpl3wgflse/_remote_module_non_scriptable.py 2022-05-18T04:27:22.3148499Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpbg9gi269 2022-05-18T04:27:22.3151293Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpbg9gi269/_remote_module_non_scriptable.py 2022-05-18T04:27:22.3188982Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp_3c2pbar 2022-05-18T04:27:22.3189529Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp2otu9kbe 2022-05-18T04:27:22.3191638Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp_3c2pbar/_remote_module_non_scriptable.py 2022-05-18T04:27:22.3192192Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp2otu9kbe/_remote_module_non_scriptable.py 2022-05-18T04:27:22.6682663Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:27:22.6715701Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:27:22.6730237Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:27:22.6872705Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:27:28.3463559Z ok (8.618s) 2022-05-18T04:27:28.3463783Z 2022-05-18T04:27:28.3464192Z ---------------------------------------------------------------------- 2022-05-18T04:27:28.3464551Z Ran 1 test in 8.618s 2022-05-18T04:27:28.3464872Z 2022-05-18T04:27:28.3465027Z OK 2022-05-18T04:27:28.3465232Z 2022-05-18T04:27:28.3465458Z Generating XML reports... 2022-05-18T04:27:28.3508984Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518042719.xml 2022-05-18T04:27:29.5052257Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpures1wa2 2022-05-18T04:27:29.5053397Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpures1wa2/_remote_module_non_scriptable.py 2022-05-18T04:27:29.8723373Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:27:29.8738418Z 2022-05-18T04:27:29.8738706Z Running tests... 2022-05-18T04:27:29.8739345Z ---------------------------------------------------------------------- 2022-05-18T04:27:31.5229562Z test_device_map_gpu_mixed_5 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:27:31.5847815Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 11938 2022-05-18T04:27:31.5950335Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 11939 2022-05-18T04:27:31.6056860Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 11940 2022-05-18T04:27:31.6162269Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 11941 2022-05-18T04:27:32.5963314Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpzavs7f5h 2022-05-18T04:27:32.5964368Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpzavs7f5h/_remote_module_non_scriptable.py 2022-05-18T04:27:32.5975945Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmphx_9q1wt 2022-05-18T04:27:32.5976471Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpzvzujnmu 2022-05-18T04:27:32.5978611Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmphx_9q1wt/_remote_module_non_scriptable.py 2022-05-18T04:27:32.5979445Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpzvzujnmu/_remote_module_non_scriptable.py 2022-05-18T04:27:32.6390173Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpozylugm1 2022-05-18T04:27:32.6392319Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpozylugm1/_remote_module_non_scriptable.py 2022-05-18T04:27:32.9496850Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:27:32.9502110Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:27:32.9509470Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:27:33.0051166Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:27:38.6344222Z ok (8.760s) 2022-05-18T04:27:38.6344457Z 2022-05-18T04:27:38.6344863Z ---------------------------------------------------------------------- 2022-05-18T04:27:38.6345216Z Ran 1 test in 8.761s 2022-05-18T04:27:38.6345384Z 2022-05-18T04:27:38.6345481Z OK 2022-05-18T04:27:38.6345623Z 2022-05-18T04:27:38.6345758Z Generating XML reports... 2022-05-18T04:27:38.6389465Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518042729.xml 2022-05-18T04:27:39.8000830Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpl0i_ncks 2022-05-18T04:27:39.8002056Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpl0i_ncks/_remote_module_non_scriptable.py 2022-05-18T04:27:40.1708319Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:27:40.1723755Z 2022-05-18T04:27:40.1724086Z Running tests... 2022-05-18T04:27:40.1724568Z ---------------------------------------------------------------------- 2022-05-18T04:27:41.8287054Z test_device_map_gpu_mixed_6 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:27:41.8928247Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 12303 2022-05-18T04:27:41.9032958Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 12304 2022-05-18T04:27:41.9140054Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 12305 2022-05-18T04:27:41.9247330Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 12306 2022-05-18T04:27:42.7962416Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp064kaj7w 2022-05-18T04:27:42.7963594Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp064kaj7w/_remote_module_non_scriptable.py 2022-05-18T04:27:42.8000418Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpewc427z1 2022-05-18T04:27:42.8002606Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpewc427z1/_remote_module_non_scriptable.py 2022-05-18T04:27:42.8037399Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmphra3qgor 2022-05-18T04:27:42.8040288Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmphra3qgor/_remote_module_non_scriptable.py 2022-05-18T04:27:42.8512260Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp2qlmi4od 2022-05-18T04:27:42.8514000Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp2qlmi4od/_remote_module_non_scriptable.py 2022-05-18T04:27:43.1482922Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:27:43.1620141Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:27:43.1631949Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:27:43.2050498Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:27:48.8431723Z ok (8.670s) 2022-05-18T04:27:48.8432103Z 2022-05-18T04:27:48.8432531Z ---------------------------------------------------------------------- 2022-05-18T04:27:48.8432882Z Ran 1 test in 8.671s 2022-05-18T04:27:48.8433112Z 2022-05-18T04:27:48.8433274Z OK 2022-05-18T04:27:48.8433716Z 2022-05-18T04:27:48.8433919Z Generating XML reports... 2022-05-18T04:27:48.8476895Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518042740.xml 2022-05-18T04:27:50.0186963Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpi6d2gp7w 2022-05-18T04:27:50.0188794Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpi6d2gp7w/_remote_module_non_scriptable.py 2022-05-18T04:27:50.3957386Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:27:50.3973452Z 2022-05-18T04:27:50.3973923Z Running tests... 2022-05-18T04:27:50.3974451Z ---------------------------------------------------------------------- 2022-05-18T04:27:52.0596413Z test_device_map_gpu_mixed_7 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:27:52.1240511Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 12668 2022-05-18T04:27:52.1343771Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 12669 2022-05-18T04:27:52.1449031Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 12670 2022-05-18T04:27:52.1554529Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 12671 2022-05-18T04:27:53.1058266Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp6h7ltt2c 2022-05-18T04:27:53.1059334Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp6h7ltt2c/_remote_module_non_scriptable.py 2022-05-18T04:27:53.1111248Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpqd3l8n3j 2022-05-18T04:27:53.1114030Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpqd3l8n3j/_remote_module_non_scriptable.py 2022-05-18T04:27:53.1498769Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp663_p5o9 2022-05-18T04:27:53.1501407Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp663_p5o9/_remote_module_non_scriptable.py 2022-05-18T04:27:53.1793725Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmppqap8oza 2022-05-18T04:27:53.1796644Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmppqap8oza/_remote_module_non_scriptable.py 2022-05-18T04:27:53.4631512Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:27:53.4681057Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:27:53.5061734Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:27:53.5452928Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:27:59.2771236Z ok (8.879s) 2022-05-18T04:27:59.2771483Z 2022-05-18T04:27:59.2771895Z ---------------------------------------------------------------------- 2022-05-18T04:27:59.2772238Z Ran 1 test in 8.880s 2022-05-18T04:27:59.2772417Z 2022-05-18T04:27:59.2772518Z OK 2022-05-18T04:27:59.2772659Z 2022-05-18T04:27:59.2772819Z Generating XML reports... 2022-05-18T04:27:59.2817951Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518042750.xml 2022-05-18T04:28:00.4322165Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpd5r9ifeq 2022-05-18T04:28:00.4323381Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpd5r9ifeq/_remote_module_non_scriptable.py 2022-05-18T04:28:00.7933979Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:28:00.7948726Z 2022-05-18T04:28:00.7948965Z Running tests... 2022-05-18T04:28:00.7949415Z ---------------------------------------------------------------------- 2022-05-18T04:28:02.4339631Z test_device_map_gpu_mixed_8 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:28:02.4959645Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 13033 2022-05-18T04:28:02.5062108Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 13034 2022-05-18T04:28:02.5168124Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 13035 2022-05-18T04:28:02.5276615Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 13036 2022-05-18T04:28:03.4030252Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp_8ru6nm8 2022-05-18T04:28:03.4030875Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpmkb6enl3 2022-05-18T04:28:03.4031452Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp_8ru6nm8/_remote_module_non_scriptable.py 2022-05-18T04:28:03.4032206Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpmkb6enl3/_remote_module_non_scriptable.py 2022-05-18T04:28:03.4254518Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpmk850ayi 2022-05-18T04:28:03.4257112Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpmk850ayi/_remote_module_non_scriptable.py 2022-05-18T04:28:03.4545141Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpih2sdfl8 2022-05-18T04:28:03.4547774Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpih2sdfl8/_remote_module_non_scriptable.py 2022-05-18T04:28:03.7545868Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:28:03.7622201Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:28:03.7808787Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:28:03.8067144Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:28:09.4460216Z ok (8.651s) 2022-05-18T04:28:09.4460545Z 2022-05-18T04:28:09.4461048Z ---------------------------------------------------------------------- 2022-05-18T04:28:09.4461427Z Ran 1 test in 8.651s 2022-05-18T04:28:09.4461600Z 2022-05-18T04:28:09.4464329Z OK 2022-05-18T04:28:09.4464508Z 2022-05-18T04:28:09.4464660Z Generating XML reports... 2022-05-18T04:28:09.4506382Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518042800.xml 2022-05-18T04:28:10.6163451Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpihbxtwtz 2022-05-18T04:28:10.6164844Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpihbxtwtz/_remote_module_non_scriptable.py 2022-05-18T04:28:10.9855555Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:28:10.9870898Z 2022-05-18T04:28:10.9871047Z Running tests... 2022-05-18T04:28:10.9871772Z ---------------------------------------------------------------------- 2022-05-18T04:28:12.6434435Z test_device_map_gpu_mixed_self_1 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:28:12.7080632Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 13398 2022-05-18T04:28:12.7184417Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 13399 2022-05-18T04:28:12.7290152Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 13400 2022-05-18T04:28:12.7397947Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 13401 2022-05-18T04:28:13.6410414Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp6bundjur 2022-05-18T04:28:13.6411688Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp6bundjur/_remote_module_non_scriptable.py 2022-05-18T04:28:13.6462087Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp21s0ligy 2022-05-18T04:28:13.6464809Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp21s0ligy/_remote_module_non_scriptable.py 2022-05-18T04:28:13.6735006Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmppnmwrfjn 2022-05-18T04:28:13.6737445Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmppnmwrfjn/_remote_module_non_scriptable.py 2022-05-18T04:28:13.6746100Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpnbhpjmrr 2022-05-18T04:28:13.6749047Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpnbhpjmrr/_remote_module_non_scriptable.py 2022-05-18T04:28:13.9992071Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:28:14.0122878Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:28:14.0335951Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:28:14.0340615Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:28:19.6583447Z ok (8.671s) 2022-05-18T04:28:19.6583678Z 2022-05-18T04:28:19.6584105Z ---------------------------------------------------------------------- 2022-05-18T04:28:19.6584457Z Ran 1 test in 8.671s 2022-05-18T04:28:19.6584610Z 2022-05-18T04:28:19.6584716Z OK 2022-05-18T04:28:19.6584854Z 2022-05-18T04:28:19.6584989Z Generating XML reports... 2022-05-18T04:28:19.6628260Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518042810.xml 2022-05-18T04:28:20.8184786Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp2bxozv6v 2022-05-18T04:28:20.8185691Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp2bxozv6v/_remote_module_non_scriptable.py 2022-05-18T04:28:21.1769781Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:28:21.1784688Z 2022-05-18T04:28:21.1784934Z Running tests... 2022-05-18T04:28:21.1785388Z ---------------------------------------------------------------------- 2022-05-18T04:28:22.8022441Z test_device_map_gpu_mixed_self_2 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:28:22.8655551Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 13755 2022-05-18T04:28:22.8759429Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 13756 2022-05-18T04:28:22.8866241Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 13757 2022-05-18T04:28:22.8970940Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 13758 2022-05-18T04:28:23.7752873Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpemcb2o07 2022-05-18T04:28:23.7753955Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpemcb2o07/_remote_module_non_scriptable.py 2022-05-18T04:28:23.7805932Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpq22xmmy3 2022-05-18T04:28:23.7808751Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpq22xmmy3/_remote_module_non_scriptable.py 2022-05-18T04:28:23.7862305Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpx1vpw__u 2022-05-18T04:28:23.7865207Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpx1vpw__u/_remote_module_non_scriptable.py 2022-05-18T04:28:23.8027939Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpog5rbgx5 2022-05-18T04:28:23.8030736Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpog5rbgx5/_remote_module_non_scriptable.py 2022-05-18T04:28:24.1335380Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:28:24.1404153Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:28:24.1531562Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:28:24.1644316Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:28:29.8187908Z ok (8.640s) 2022-05-18T04:28:29.8188146Z 2022-05-18T04:28:29.8188560Z ---------------------------------------------------------------------- 2022-05-18T04:28:29.8188898Z Ran 1 test in 8.640s 2022-05-18T04:28:29.8189077Z 2022-05-18T04:28:29.8189179Z OK 2022-05-18T04:28:29.8189322Z 2022-05-18T04:28:29.8189466Z Generating XML reports... 2022-05-18T04:28:29.8234792Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518042821.xml 2022-05-18T04:28:31.0005789Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpaufy1oo8 2022-05-18T04:28:31.0007090Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpaufy1oo8/_remote_module_non_scriptable.py 2022-05-18T04:28:31.3767560Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:28:31.3783712Z 2022-05-18T04:28:31.3783997Z Running tests... 2022-05-18T04:28:31.3784444Z ---------------------------------------------------------------------- 2022-05-18T04:28:33.0097440Z test_device_map_gpu_mixed_self_3 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:28:33.0752695Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 14112 2022-05-18T04:28:33.0856386Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 14113 2022-05-18T04:28:33.0960714Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 14114 2022-05-18T04:28:33.1069134Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 14115 2022-05-18T04:28:33.9831466Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp1k0i5q5u 2022-05-18T04:28:33.9832613Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp1k0i5q5u/_remote_module_non_scriptable.py 2022-05-18T04:28:33.9840373Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpgzy1y2v_ 2022-05-18T04:28:33.9843391Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpgzy1y2v_/_remote_module_non_scriptable.py 2022-05-18T04:28:33.9862807Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp3wp1vsp7 2022-05-18T04:28:33.9864936Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp3wp1vsp7/_remote_module_non_scriptable.py 2022-05-18T04:28:33.9917314Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpl0y9f0gq 2022-05-18T04:28:33.9919695Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpl0y9f0gq/_remote_module_non_scriptable.py 2022-05-18T04:28:34.3363169Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:28:34.3380956Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:28:34.3397212Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:28:34.3555480Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:28:39.9257571Z ok (8.547s) 2022-05-18T04:28:39.9257860Z 2022-05-18T04:28:39.9258270Z ---------------------------------------------------------------------- 2022-05-18T04:28:39.9258625Z Ran 1 test in 8.547s 2022-05-18T04:28:39.9258774Z 2022-05-18T04:28:39.9258873Z OK 2022-05-18T04:28:39.9259016Z 2022-05-18T04:28:39.9259156Z Generating XML reports... 2022-05-18T04:28:39.9302108Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518042831.xml 2022-05-18T04:28:41.0791013Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp10zppqrx 2022-05-18T04:28:41.0792195Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp10zppqrx/_remote_module_non_scriptable.py 2022-05-18T04:28:41.4397883Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:28:41.4412046Z 2022-05-18T04:28:41.4412328Z Running tests... 2022-05-18T04:28:41.4412782Z ---------------------------------------------------------------------- 2022-05-18T04:28:43.0433252Z test_device_map_gpu_mixed_self_4 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:28:43.1064913Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 14469 2022-05-18T04:28:43.1166449Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 14470 2022-05-18T04:28:43.1272733Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 14471 2022-05-18T04:28:43.1380796Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 14472 2022-05-18T04:28:44.0063894Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp391m109d 2022-05-18T04:28:44.0064745Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp391m109d/_remote_module_non_scriptable.py 2022-05-18T04:28:44.0424296Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp8f49m15m 2022-05-18T04:28:44.0425782Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp8f49m15m/_remote_module_non_scriptable.py 2022-05-18T04:28:44.0449197Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpn7he5svz 2022-05-18T04:28:44.0452395Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpn7he5svz/_remote_module_non_scriptable.py 2022-05-18T04:28:44.0853983Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpt2s17yby 2022-05-18T04:28:44.0856578Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpt2s17yby/_remote_module_non_scriptable.py 2022-05-18T04:28:44.3619870Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:28:44.3944122Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:28:44.3987864Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:28:44.4542570Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:28:50.0566225Z ok (8.615s) 2022-05-18T04:28:50.0566457Z 2022-05-18T04:28:50.0566856Z ---------------------------------------------------------------------- 2022-05-18T04:28:50.0567211Z Ran 1 test in 8.615s 2022-05-18T04:28:50.0567378Z 2022-05-18T04:28:50.0567475Z OK 2022-05-18T04:28:50.0567614Z 2022-05-18T04:28:50.0567770Z Generating XML reports... 2022-05-18T04:28:50.0609764Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518042841.xml 2022-05-18T04:28:51.2183637Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmphka1ijo_ 2022-05-18T04:28:51.2184651Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmphka1ijo_/_remote_module_non_scriptable.py 2022-05-18T04:28:51.5838601Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:28:51.5853688Z 2022-05-18T04:28:51.5853815Z Running tests... 2022-05-18T04:28:51.5854270Z ---------------------------------------------------------------------- 2022-05-18T04:28:53.2089101Z test_device_map_gpu_mixed_self_5 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:28:53.2720318Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 14826 2022-05-18T04:28:53.2823262Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 14827 2022-05-18T04:28:53.2928610Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 14828 2022-05-18T04:28:53.3033046Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 14829 2022-05-18T04:28:54.2265096Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpmgji9a_3 2022-05-18T04:28:54.2265768Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpkj9r2ufe 2022-05-18T04:28:54.2266326Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpmgji9a_3/_remote_module_non_scriptable.py 2022-05-18T04:28:54.2268815Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpkj9r2ufe/_remote_module_non_scriptable.py 2022-05-18T04:28:54.2356430Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpg8yt0pv5 2022-05-18T04:28:54.2359124Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpg8yt0pv5/_remote_module_non_scriptable.py 2022-05-18T04:28:54.2568019Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpdn5ej7xi 2022-05-18T04:28:54.2570685Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpdn5ej7xi/_remote_module_non_scriptable.py 2022-05-18T04:28:54.5788833Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:28:54.5793477Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:28:54.5921543Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:28:54.6212680Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:29:00.2215646Z ok (8.636s) 2022-05-18T04:29:00.2215881Z 2022-05-18T04:29:00.2216295Z ---------------------------------------------------------------------- 2022-05-18T04:29:00.2216629Z Ran 1 test in 8.636s 2022-05-18T04:29:00.2216822Z 2022-05-18T04:29:00.2216921Z OK 2022-05-18T04:29:00.2217058Z 2022-05-18T04:29:00.2217195Z Generating XML reports... 2022-05-18T04:29:00.2261036Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518042851.xml 2022-05-18T04:29:01.3919999Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpg8rfxbgs 2022-05-18T04:29:01.3921358Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpg8rfxbgs/_remote_module_non_scriptable.py 2022-05-18T04:29:01.7641540Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:29:01.7657086Z 2022-05-18T04:29:01.7657319Z Running tests... 2022-05-18T04:29:01.7657774Z ---------------------------------------------------------------------- 2022-05-18T04:29:03.4187335Z test_device_map_gpu_mixed_self_6 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:29:03.4827152Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 15183 2022-05-18T04:29:03.4931906Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 15184 2022-05-18T04:29:03.5038403Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 15185 2022-05-18T04:29:03.5146906Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 15186 2022-05-18T04:29:04.4225849Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpbwenfqia 2022-05-18T04:29:04.4226446Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpiiedm2i6 2022-05-18T04:29:04.4227005Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpbwenfqia/_remote_module_non_scriptable.py 2022-05-18T04:29:04.4227805Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpiiedm2i6/_remote_module_non_scriptable.py 2022-05-18T04:29:04.4294836Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpybwk9xl1 2022-05-18T04:29:04.4297307Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpybwk9xl1/_remote_module_non_scriptable.py 2022-05-18T04:29:04.4341515Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp69qi5b40 2022-05-18T04:29:04.4344347Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp69qi5b40/_remote_module_non_scriptable.py 2022-05-18T04:29:04.7795099Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:29:04.7816465Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:29:04.7855505Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:29:04.7925086Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:29:10.5334889Z ok (8.767s) 2022-05-18T04:29:10.5335138Z 2022-05-18T04:29:10.5335574Z ---------------------------------------------------------------------- 2022-05-18T04:29:10.5335939Z Ran 1 test in 8.768s 2022-05-18T04:29:10.5336112Z 2022-05-18T04:29:10.5336224Z OK 2022-05-18T04:29:10.5336365Z 2022-05-18T04:29:10.5336506Z Generating XML reports... 2022-05-18T04:29:10.5381427Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518042901.xml 2022-05-18T04:29:11.7139545Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpet76oha6 2022-05-18T04:29:11.7140603Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpet76oha6/_remote_module_non_scriptable.py 2022-05-18T04:29:12.0876569Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:29:12.0891671Z 2022-05-18T04:29:12.0891840Z Running tests... 2022-05-18T04:29:12.0892293Z ---------------------------------------------------------------------- 2022-05-18T04:29:13.7495085Z test_device_map_gpu_mixed_self_7 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:29:13.8137953Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 15540 2022-05-18T04:29:13.8240504Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 15541 2022-05-18T04:29:13.8349477Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 15542 2022-05-18T04:29:13.8455697Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 15543 2022-05-18T04:29:14.7005458Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpc8w24jdj 2022-05-18T04:29:14.7006600Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpc8w24jdj/_remote_module_non_scriptable.py 2022-05-18T04:29:14.7491272Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpn1dgkaio 2022-05-18T04:29:14.7493807Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpn1dgkaio/_remote_module_non_scriptable.py 2022-05-18T04:29:14.7500069Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpwz3j8azb 2022-05-18T04:29:14.7502309Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpwz3j8azb/_remote_module_non_scriptable.py 2022-05-18T04:29:14.7538973Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpg_nwuq14 2022-05-18T04:29:14.7541157Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpg_nwuq14/_remote_module_non_scriptable.py 2022-05-18T04:29:15.0576946Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:29:15.1052360Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:29:15.1064632Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:29:15.1188302Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:29:20.7641640Z ok (8.675s) 2022-05-18T04:29:20.7641880Z 2022-05-18T04:29:20.7642380Z ---------------------------------------------------------------------- 2022-05-18T04:29:20.7642865Z Ran 1 test in 8.675s 2022-05-18T04:29:20.7643037Z 2022-05-18T04:29:20.7643117Z OK 2022-05-18T04:29:20.7643259Z 2022-05-18T04:29:20.7643395Z Generating XML reports... 2022-05-18T04:29:20.7687121Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518042912.xml 2022-05-18T04:29:21.9437903Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpu2a8zg80 2022-05-18T04:29:21.9439022Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpu2a8zg80/_remote_module_non_scriptable.py 2022-05-18T04:29:22.3288066Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:29:22.3304278Z 2022-05-18T04:29:22.3304715Z Running tests... 2022-05-18T04:29:22.3305244Z ---------------------------------------------------------------------- 2022-05-18T04:29:23.9774553Z test_device_map_gpu_mixed_self_8 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:29:24.0417713Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 15897 2022-05-18T04:29:24.0520122Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 15898 2022-05-18T04:29:24.0627606Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 15899 2022-05-18T04:29:24.0736125Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 15900 2022-05-18T04:29:24.9406713Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpudzp0p18 2022-05-18T04:29:24.9407891Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpudzp0p18/_remote_module_non_scriptable.py 2022-05-18T04:29:24.9476997Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpnffq8ipb 2022-05-18T04:29:24.9479400Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpnffq8ipb/_remote_module_non_scriptable.py 2022-05-18T04:29:24.9551871Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpdm7wzkmh 2022-05-18T04:29:24.9554271Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpdm7wzkmh/_remote_module_non_scriptable.py 2022-05-18T04:29:24.9854489Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmplhq_outd 2022-05-18T04:29:24.9856885Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmplhq_outd/_remote_module_non_scriptable.py 2022-05-18T04:29:25.2931578Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:29:25.3044333Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:29:25.3303765Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:29:25.3439055Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:29:30.9922528Z ok (8.661s) 2022-05-18T04:29:30.9922746Z 2022-05-18T04:29:30.9923159Z ---------------------------------------------------------------------- 2022-05-18T04:29:30.9923506Z Ran 1 test in 8.662s 2022-05-18T04:29:30.9923678Z 2022-05-18T04:29:30.9923777Z OK 2022-05-18T04:29:30.9924822Z 2022-05-18T04:29:30.9926914Z Generating XML reports... 2022-05-18T04:29:30.9969200Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518042922.xml 2022-05-18T04:29:32.1301837Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp58_q9k3f 2022-05-18T04:29:32.1304000Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp58_q9k3f/_remote_module_non_scriptable.py 2022-05-18T04:29:32.5069301Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:29:32.5084397Z 2022-05-18T04:29:32.5084637Z Running tests... 2022-05-18T04:29:32.5085081Z ---------------------------------------------------------------------- 2022-05-18T04:29:34.1475794Z test_device_map_gpu_non_default (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:29:34.2107752Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 16254 2022-05-18T04:29:34.2212895Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 16255 2022-05-18T04:29:34.2318096Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 16256 2022-05-18T04:29:34.2423794Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 16257 2022-05-18T04:29:35.1298187Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpmfhs8r5v 2022-05-18T04:29:35.1299405Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpmfhs8r5v/_remote_module_non_scriptable.py 2022-05-18T04:29:35.1299984Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpt0vn685j 2022-05-18T04:29:35.1302554Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpt0vn685j/_remote_module_non_scriptable.py 2022-05-18T04:29:35.1338466Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpg3ivu96k 2022-05-18T04:29:35.1341275Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpg3ivu96k/_remote_module_non_scriptable.py 2022-05-18T04:29:35.1766071Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpbfnzlsg4 2022-05-18T04:29:35.1768852Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpbfnzlsg4/_remote_module_non_scriptable.py 2022-05-18T04:29:35.4864702Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:29:35.4887199Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:29:35.5030399Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:29:35.5367800Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:29:38.8562297Z ok (6.347s) 2022-05-18T04:29:38.8562533Z 2022-05-18T04:29:38.8562932Z ---------------------------------------------------------------------- 2022-05-18T04:29:38.8563284Z Ran 1 test in 6.348s 2022-05-18T04:29:38.8563458Z 2022-05-18T04:29:38.8563557Z OK 2022-05-18T04:29:38.8563699Z 2022-05-18T04:29:38.8563836Z Generating XML reports... 2022-05-18T04:29:38.8605847Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518042932.xml 2022-05-18T04:29:40.0099535Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpir7y2w38 2022-05-18T04:29:40.0100494Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpir7y2w38/_remote_module_non_scriptable.py 2022-05-18T04:29:40.3689988Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:29:40.3704833Z 2022-05-18T04:29:40.3705110Z Running tests... 2022-05-18T04:29:40.3705566Z ---------------------------------------------------------------------- 2022-05-18T04:29:41.9962026Z test_device_map_gpu_non_default_to_default (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:29:42.0582820Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 16607 2022-05-18T04:29:42.0688282Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 16608 2022-05-18T04:29:42.0795247Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 16609 2022-05-18T04:29:42.0905078Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 16610 2022-05-18T04:29:43.0107056Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmprhkrh7dw 2022-05-18T04:29:43.0108361Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmprhkrh7dw/_remote_module_non_scriptable.py 2022-05-18T04:29:43.0396142Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpgce2ukb4 2022-05-18T04:29:43.0398589Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpgce2ukb4/_remote_module_non_scriptable.py 2022-05-18T04:29:43.0464389Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp_k5i0kyj 2022-05-18T04:29:43.0466848Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp_k5i0kyj/_remote_module_non_scriptable.py 2022-05-18T04:29:43.0486592Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp9ax0vc_t 2022-05-18T04:29:43.0489528Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp9ax0vc_t/_remote_module_non_scriptable.py 2022-05-18T04:29:43.3795108Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:29:43.3938185Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:29:43.4080985Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:29:43.4137760Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:29:49.0085075Z ok (8.638s) 2022-05-18T04:29:49.0085315Z 2022-05-18T04:29:49.0085729Z ---------------------------------------------------------------------- 2022-05-18T04:29:49.0086083Z Ran 1 test in 8.638s 2022-05-18T04:29:49.0086254Z 2022-05-18T04:29:49.0086360Z OK 2022-05-18T04:29:49.0086500Z 2022-05-18T04:29:49.0086639Z Generating XML reports... 2022-05-18T04:29:49.0130721Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518042940.xml 2022-05-18T04:29:50.1707082Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp8b9nfh6c 2022-05-18T04:29:50.1708162Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp8b9nfh6c/_remote_module_non_scriptable.py 2022-05-18T04:29:50.5296136Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:29:50.5310331Z 2022-05-18T04:29:50.5310788Z Running tests... 2022-05-18T04:29:50.5311281Z ---------------------------------------------------------------------- 2022-05-18T04:29:52.1525950Z test_device_map_gpu_to_cpu_default (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:29:52.2162999Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 16972 2022-05-18T04:29:52.2266839Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 16973 2022-05-18T04:29:52.2375481Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 16974 2022-05-18T04:29:52.2483737Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 16975 2022-05-18T04:29:53.1321187Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp6qsjwgjq 2022-05-18T04:29:53.1321801Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp6qsjwgjq/_remote_module_non_scriptable.py 2022-05-18T04:29:53.1709653Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpje07f7e4 2022-05-18T04:29:53.1712128Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpje07f7e4/_remote_module_non_scriptable.py 2022-05-18T04:29:53.1895143Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp74x4j2q4 2022-05-18T04:29:53.1898151Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp74x4j2q4/_remote_module_non_scriptable.py 2022-05-18T04:29:53.2058941Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp3327l0fn 2022-05-18T04:29:53.2061785Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp3327l0fn/_remote_module_non_scriptable.py 2022-05-18T04:29:53.4847255Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:29:53.5272304Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:29:53.5432083Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:29:53.5710017Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:29:56.9617599Z ok (6.430s) 2022-05-18T04:29:56.9617826Z 2022-05-18T04:29:56.9618513Z ---------------------------------------------------------------------- 2022-05-18T04:29:56.9618912Z Ran 1 test in 6.431s 2022-05-18T04:29:56.9619066Z 2022-05-18T04:29:56.9619175Z OK 2022-05-18T04:29:56.9619315Z 2022-05-18T04:29:56.9619453Z Generating XML reports... 2022-05-18T04:29:56.9662683Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518042950.xml 2022-05-18T04:29:58.1221170Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpdm7aygo4 2022-05-18T04:29:58.1222577Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpdm7aygo4/_remote_module_non_scriptable.py 2022-05-18T04:29:58.4801401Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:29:58.4817189Z 2022-05-18T04:29:58.4817349Z Running tests... 2022-05-18T04:29:58.4817795Z ---------------------------------------------------------------------- 2022-05-18T04:30:00.1056951Z test_device_map_gpu_to_cpu_non_default (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:30:00.1710775Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 17329 2022-05-18T04:30:00.1821902Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 17330 2022-05-18T04:30:00.1927735Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 17331 2022-05-18T04:30:00.2038446Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 17332 2022-05-18T04:30:01.1000573Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpv_dc7fa2 2022-05-18T04:30:01.1001589Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpv_dc7fa2/_remote_module_non_scriptable.py 2022-05-18T04:30:01.1423649Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp3ybxt4c9 2022-05-18T04:30:01.1426245Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp3ybxt4c9/_remote_module_non_scriptable.py 2022-05-18T04:30:01.1438111Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpjqo094jy 2022-05-18T04:30:01.1440759Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpjqo094jy/_remote_module_non_scriptable.py 2022-05-18T04:30:01.1473674Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpmn8nq45h 2022-05-18T04:30:01.1476185Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpmn8nq45h/_remote_module_non_scriptable.py 2022-05-18T04:30:01.4717733Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:30:01.4964008Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:30:01.4972231Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:30:01.5032081Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:30:04.9170229Z ok (6.435s) 2022-05-18T04:30:04.9170518Z 2022-05-18T04:30:04.9171272Z ---------------------------------------------------------------------- 2022-05-18T04:30:04.9171639Z Ran 1 test in 6.435s 2022-05-18T04:30:04.9171807Z 2022-05-18T04:30:04.9171893Z OK 2022-05-18T04:30:04.9172028Z 2022-05-18T04:30:04.9172168Z Generating XML reports... 2022-05-18T04:30:04.9215591Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518042958.xml 2022-05-18T04:30:06.0874074Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp7j6cread 2022-05-18T04:30:06.0875287Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp7j6cread/_remote_module_non_scriptable.py 2022-05-18T04:30:06.4676333Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:30:06.4691582Z 2022-05-18T04:30:06.4691790Z Running tests... 2022-05-18T04:30:06.4692362Z ---------------------------------------------------------------------- 2022-05-18T04:30:08.1130614Z test_device_maps_gpu (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:30:08.1775601Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 17686 2022-05-18T04:30:08.1879412Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 17687 2022-05-18T04:30:08.1987488Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 17688 2022-05-18T04:30:08.2094118Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 17689 2022-05-18T04:30:09.0850126Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp_alddo5o 2022-05-18T04:30:09.0851166Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp_alddo5o/_remote_module_non_scriptable.py 2022-05-18T04:30:09.1440015Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpnqwxus4q 2022-05-18T04:30:09.1441733Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpnqwxus4q/_remote_module_non_scriptable.py 2022-05-18T04:30:09.1447943Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpkclloihi 2022-05-18T04:30:09.1450917Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpkclloihi/_remote_module_non_scriptable.py 2022-05-18T04:30:09.1462856Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpsbcid4tm 2022-05-18T04:30:09.1465559Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpsbcid4tm/_remote_module_non_scriptable.py 2022-05-18T04:30:09.4383044Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:30:09.4965889Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:30:09.4990809Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:30:09.5020943Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:30:15.1282536Z ok (8.659s) 2022-05-18T04:30:15.1282936Z 2022-05-18T04:30:15.1283713Z ---------------------------------------------------------------------- 2022-05-18T04:30:15.1284183Z Ran 1 test in 8.659s 2022-05-18T04:30:15.1284378Z 2022-05-18T04:30:15.1284540Z OK 2022-05-18T04:30:15.1287079Z 2022-05-18T04:30:15.1287287Z Generating XML reports... 2022-05-18T04:30:15.1331934Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518043006.xml 2022-05-18T04:30:16.2776638Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp0bmezi12 2022-05-18T04:30:16.2777830Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp0bmezi12/_remote_module_non_scriptable.py 2022-05-18T04:30:16.6448466Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:30:16.6465654Z 2022-05-18T04:30:16.6466048Z Running tests... 2022-05-18T04:30:16.6466956Z ---------------------------------------------------------------------- 2022-05-18T04:30:18.2936904Z test_device_maps_in_options (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:30:18.3574850Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 18051 2022-05-18T04:30:18.3677307Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 18052 2022-05-18T04:30:18.3785124Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 18053 2022-05-18T04:30:18.3894353Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 18054 2022-05-18T04:30:19.2745301Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp1lblwemc 2022-05-18T04:30:19.2746547Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp1lblwemc/_remote_module_non_scriptable.py 2022-05-18T04:30:19.2755368Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpmsznrc7_ 2022-05-18T04:30:19.2758278Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpmsznrc7_/_remote_module_non_scriptable.py 2022-05-18T04:30:19.2777994Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpys_88jof 2022-05-18T04:30:19.2780390Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpys_88jof/_remote_module_non_scriptable.py 2022-05-18T04:30:19.2942488Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpaz79bxn_ 2022-05-18T04:30:19.2944533Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpaz79bxn_/_remote_module_non_scriptable.py 2022-05-18T04:30:19.6332693Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:30:19.6374826Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:30:19.6414885Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:30:19.6511374Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:30:25.3076442Z ok (8.661s) 2022-05-18T04:30:25.3076675Z 2022-05-18T04:30:25.3077081Z ---------------------------------------------------------------------- 2022-05-18T04:30:25.3077452Z Ran 1 test in 8.661s 2022-05-18T04:30:25.3077624Z 2022-05-18T04:30:25.3077720Z OK 2022-05-18T04:30:25.3077840Z 2022-05-18T04:30:25.3077978Z Generating XML reports... 2022-05-18T04:30:25.3121277Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518043016.xml 2022-05-18T04:30:26.4694224Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp9heabfty 2022-05-18T04:30:26.4696225Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp9heabfty/_remote_module_non_scriptable.py 2022-05-18T04:30:26.8317887Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:30:26.8332928Z 2022-05-18T04:30:26.8333149Z Running tests... 2022-05-18T04:30:26.8333593Z ---------------------------------------------------------------------- 2022-05-18T04:30:28.4645880Z test_device_maps_invalid_max_local_device (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:30:28.5292561Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 18416 2022-05-18T04:30:28.5395520Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 18417 2022-05-18T04:30:28.5503487Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 18418 2022-05-18T04:30:28.5612936Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 18419 2022-05-18T04:30:29.4541654Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpbkohszw_ 2022-05-18T04:30:29.4542530Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpbkohszw_/_remote_module_non_scriptable.py 2022-05-18T04:30:29.4604700Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpefg6tcb4 2022-05-18T04:30:29.4607598Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpefg6tcb4/_remote_module_non_scriptable.py 2022-05-18T04:30:29.4615504Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpvuj18fp2 2022-05-18T04:30:29.4618208Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpvuj18fp2/_remote_module_non_scriptable.py 2022-05-18T04:30:29.4742351Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp6_dcqpuj 2022-05-18T04:30:29.4745609Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp6_dcqpuj/_remote_module_non_scriptable.py 2022-05-18T04:30:29.8086474Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:30:29.8169065Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:30:29.8259730Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:30:29.8404601Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:30:30.1674368Z ok (3.334s) 2022-05-18T04:30:30.1674578Z 2022-05-18T04:30:30.1675214Z ---------------------------------------------------------------------- 2022-05-18T04:30:30.1675646Z Ran 1 test in 3.334s 2022-05-18T04:30:30.1675820Z 2022-05-18T04:30:30.1675923Z OK 2022-05-18T04:30:30.1676065Z 2022-05-18T04:30:30.1676179Z Generating XML reports... 2022-05-18T04:30:30.1721188Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518043026.xml 2022-05-18T04:30:31.3605793Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpp5yvv5no 2022-05-18T04:30:31.3606888Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpp5yvv5no/_remote_module_non_scriptable.py 2022-05-18T04:30:31.7299016Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:30:31.7314162Z 2022-05-18T04:30:31.7314399Z Running tests... 2022-05-18T04:30:31.7314846Z ---------------------------------------------------------------------- 2022-05-18T04:30:33.3876438Z test_device_maps_invalid_max_remote_device (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:30:33.4519192Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 18609 2022-05-18T04:30:33.4623441Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 18610 2022-05-18T04:30:33.4731306Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 18611 2022-05-18T04:30:33.4838433Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 18612 2022-05-18T04:30:34.3405846Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpzb_8d_3y 2022-05-18T04:30:34.3407021Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpzb_8d_3y/_remote_module_non_scriptable.py 2022-05-18T04:30:34.4113265Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp_7j0lryo 2022-05-18T04:30:34.4115194Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp_7j0lryo/_remote_module_non_scriptable.py 2022-05-18T04:30:34.4244573Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp5f8ytb6a 2022-05-18T04:30:34.4246871Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp5f8ytb6a/_remote_module_non_scriptable.py 2022-05-18T04:30:34.4272065Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpfbh_8lw2 2022-05-18T04:30:34.4274428Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpfbh_8lw2/_remote_module_non_scriptable.py 2022-05-18T04:30:34.7034738Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:30:34.7794832Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:30:34.7823192Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:30:34.7917597Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:30:35.0895877Z ok (3.358s) 2022-05-18T04:30:35.0896138Z 2022-05-18T04:30:35.0896720Z ---------------------------------------------------------------------- 2022-05-18T04:30:35.0897075Z Ran 1 test in 3.358s 2022-05-18T04:30:35.0897243Z 2022-05-18T04:30:35.0897341Z OK 2022-05-18T04:30:35.0897482Z 2022-05-18T04:30:35.0897602Z Generating XML reports... 2022-05-18T04:30:35.0940980Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518043031.xml 2022-05-18T04:30:36.2734169Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpnmtq6_wz 2022-05-18T04:30:36.2735658Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpnmtq6_wz/_remote_module_non_scriptable.py 2022-05-18T04:30:36.6440434Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:30:36.6456371Z 2022-05-18T04:30:36.6456812Z Running tests... 2022-05-18T04:30:36.6457326Z ---------------------------------------------------------------------- 2022-05-18T04:30:38.3031612Z test_device_maps_invalid_min_device (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:30:38.3682036Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 18802 2022-05-18T04:30:38.3788262Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 18803 2022-05-18T04:30:38.3897433Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 18804 2022-05-18T04:30:38.4007635Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 18805 2022-05-18T04:30:39.2237615Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpjhbo7jcz 2022-05-18T04:30:39.2238802Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpjhbo7jcz/_remote_module_non_scriptable.py 2022-05-18T04:30:39.2753684Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpkpww9zc8 2022-05-18T04:30:39.2755760Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpkpww9zc8/_remote_module_non_scriptable.py 2022-05-18T04:30:39.2787654Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp_265_rna 2022-05-18T04:30:39.2790508Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp_265_rna/_remote_module_non_scriptable.py 2022-05-18T04:30:39.2883921Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmph8hcn7wn 2022-05-18T04:30:39.2886725Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmph8hcn7wn/_remote_module_non_scriptable.py 2022-05-18T04:30:39.5827993Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:30:39.6307905Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:30:39.6405855Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:30:39.6529929Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:30:39.9065056Z ok (3.261s) 2022-05-18T04:30:39.9065286Z 2022-05-18T04:30:39.9065790Z ---------------------------------------------------------------------- 2022-05-18T04:30:39.9066284Z Ran 1 test in 3.261s 2022-05-18T04:30:39.9066456Z 2022-05-18T04:30:39.9066569Z OK 2022-05-18T04:30:39.9066713Z 2022-05-18T04:30:39.9066850Z Generating XML reports... 2022-05-18T04:30:39.9109537Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518043036.xml 2022-05-18T04:30:41.0834375Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpq81mjuj9 2022-05-18T04:30:41.0835205Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpq81mjuj9/_remote_module_non_scriptable.py 2022-05-18T04:30:41.4564373Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:30:41.4580168Z 2022-05-18T04:30:41.4580498Z Running tests... 2022-05-18T04:30:41.4580969Z ---------------------------------------------------------------------- 2022-05-18T04:30:43.1127923Z test_device_maps_many_to_one (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:30:43.1783816Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 18983 2022-05-18T04:30:43.1888961Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 18984 2022-05-18T04:30:43.1998820Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 18985 2022-05-18T04:30:43.2106511Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 18986 2022-05-18T04:30:44.1745064Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp1ppvme96 2022-05-18T04:30:44.1746112Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp1ppvme96/_remote_module_non_scriptable.py 2022-05-18T04:30:44.1753630Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpk0fnxic7 2022-05-18T04:30:44.1756540Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpk0fnxic7/_remote_module_non_scriptable.py 2022-05-18T04:30:44.1910918Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp8dds7hrj 2022-05-18T04:30:44.1913624Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp8dds7hrj/_remote_module_non_scriptable.py 2022-05-18T04:30:44.2119419Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpbdh3bhk1 2022-05-18T04:30:44.2122042Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpbdh3bhk1/_remote_module_non_scriptable.py 2022-05-18T04:30:44.5292097Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:30:44.5295188Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:30:44.5568412Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:30:44.5686510Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:30:44.8164061Z ok (3.358s) 2022-05-18T04:30:44.8165637Z 2022-05-18T04:30:44.8166366Z ---------------------------------------------------------------------- 2022-05-18T04:30:44.8167073Z Ran 1 test in 3.358s 2022-05-18T04:30:44.8167301Z 2022-05-18T04:30:44.8167401Z OK 2022-05-18T04:30:44.8167540Z 2022-05-18T04:30:44.8167679Z Generating XML reports... 2022-05-18T04:30:44.8209547Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518043041.xml 2022-05-18T04:30:45.9717110Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpjm5u0wz4 2022-05-18T04:30:45.9718349Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpjm5u0wz4/_remote_module_non_scriptable.py 2022-05-18T04:30:46.3299595Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:30:46.3313930Z 2022-05-18T04:30:46.3314190Z Running tests... 2022-05-18T04:30:46.3315141Z ---------------------------------------------------------------------- 2022-05-18T04:30:47.9421302Z test_device_maps_missing_config (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:30:48.0056358Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 19176 2022-05-18T04:30:48.0161863Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 19177 2022-05-18T04:30:48.0270023Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 19178 2022-05-18T04:30:48.0378601Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 19179 2022-05-18T04:30:48.8976617Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp_2b8rn1y 2022-05-18T04:30:48.8978134Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp_2b8rn1y/_remote_module_non_scriptable.py 2022-05-18T04:30:48.9040273Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpr34xsdg5 2022-05-18T04:30:48.9042971Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpr34xsdg5/_remote_module_non_scriptable.py 2022-05-18T04:30:48.9206099Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmprqd6gnuy 2022-05-18T04:30:48.9208520Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmprqd6gnuy/_remote_module_non_scriptable.py 2022-05-18T04:30:48.9211161Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpha75w0f0 2022-05-18T04:30:48.9214353Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpha75w0f0/_remote_module_non_scriptable.py 2022-05-18T04:30:49.2550164Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:30:49.2652166Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:30:49.2838981Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:30:49.2917589Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:30:51.5483519Z ok (5.217s) 2022-05-18T04:30:51.5483947Z 2022-05-18T04:30:51.5484621Z ---------------------------------------------------------------------- 2022-05-18T04:30:51.5485241Z Ran 1 test in 5.217s 2022-05-18T04:30:51.5485553Z 2022-05-18T04:30:51.5485721Z OK 2022-05-18T04:30:51.5485990Z 2022-05-18T04:30:51.5486256Z Generating XML reports... 2022-05-18T04:30:51.5529741Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518043046.xml 2022-05-18T04:30:52.7184194Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmph2ywsomh 2022-05-18T04:30:52.7185578Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmph2ywsomh/_remote_module_non_scriptable.py 2022-05-18T04:30:53.0762994Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:30:53.0778058Z 2022-05-18T04:30:53.0778512Z Running tests... 2022-05-18T04:30:53.0779067Z ---------------------------------------------------------------------- 2022-05-18T04:30:54.6877958Z test_device_maps_missing_config_loop (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:30:54.7502165Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 19529 2022-05-18T04:30:54.7606440Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 19530 2022-05-18T04:30:54.7714277Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 19531 2022-05-18T04:30:54.7821664Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 19532 2022-05-18T04:30:55.6596427Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpazcw5u1z 2022-05-18T04:30:55.6597058Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpazcw5u1z/_remote_module_non_scriptable.py 2022-05-18T04:30:55.6723896Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp3fqi1s6r 2022-05-18T04:30:55.6726884Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp3fqi1s6r/_remote_module_non_scriptable.py 2022-05-18T04:30:55.7094607Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpk4cgr97i 2022-05-18T04:30:55.7097403Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpk4cgr97i/_remote_module_non_scriptable.py 2022-05-18T04:30:55.7337957Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmptg_gwyut 2022-05-18T04:30:55.7340825Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmptg_gwyut/_remote_module_non_scriptable.py 2022-05-18T04:30:56.0145291Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:30:56.0281087Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:30:56.0745022Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:30:56.0993777Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:30:58.3930006Z ok (5.315s) 2022-05-18T04:30:58.3933776Z 2022-05-18T04:30:58.3934474Z ---------------------------------------------------------------------- 2022-05-18T04:30:58.3934859Z Ran 1 test in 5.315s 2022-05-18T04:30:58.3935035Z 2022-05-18T04:30:58.3935142Z OK 2022-05-18T04:30:58.3935279Z 2022-05-18T04:30:58.3935399Z Generating XML reports... 2022-05-18T04:30:58.3980953Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518043053.xml 2022-05-18T04:30:59.5837257Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp1j95feb0 2022-05-18T04:30:59.5838287Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp1j95feb0/_remote_module_non_scriptable.py 2022-05-18T04:30:59.9547470Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:30:59.9562266Z 2022-05-18T04:30:59.9562689Z Running tests... 2022-05-18T04:30:59.9563175Z ---------------------------------------------------------------------- 2022-05-18T04:31:01.6048002Z test_device_maps_missing_config_not_timeout (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:31:01.6717861Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 19882 2022-05-18T04:31:01.6825797Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 19883 2022-05-18T04:31:01.6936219Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 19884 2022-05-18T04:31:01.7044071Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 19885 2022-05-18T04:31:02.6036383Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpsbfwwmeh 2022-05-18T04:31:02.6036984Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpy7vt5o_9 2022-05-18T04:31:02.6037569Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpsbfwwmeh/_remote_module_non_scriptable.py 2022-05-18T04:31:02.6039442Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpy7vt5o_9/_remote_module_non_scriptable.py 2022-05-18T04:31:02.6289240Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpvqhplov8 2022-05-18T04:31:02.6292555Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpvqhplov8/_remote_module_non_scriptable.py 2022-05-18T04:31:02.6477655Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmptwfc0biq 2022-05-18T04:31:02.6480366Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmptwfc0biq/_remote_module_non_scriptable.py 2022-05-18T04:31:02.9602333Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:31:02.9654253Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:31:03.0030052Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:31:03.0152405Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:31:05.2154840Z ok (5.259s) 2022-05-18T04:31:05.2155082Z 2022-05-18T04:31:05.2155493Z ---------------------------------------------------------------------- 2022-05-18T04:31:05.2155826Z Ran 1 test in 5.259s 2022-05-18T04:31:05.2155995Z 2022-05-18T04:31:05.2156093Z OK 2022-05-18T04:31:05.2156230Z 2022-05-18T04:31:05.2156363Z Generating XML reports... 2022-05-18T04:31:05.2199183Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518043059.xml 2022-05-18T04:31:06.3941583Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpllw21nv1 2022-05-18T04:31:06.3942419Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpllw21nv1/_remote_module_non_scriptable.py 2022-05-18T04:31:06.7621161Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:31:06.7637147Z 2022-05-18T04:31:06.7637542Z Running tests... 2022-05-18T04:31:06.7638029Z ---------------------------------------------------------------------- 2022-05-18T04:31:08.4241836Z test_device_maps_missing_config_remote (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:31:08.4885949Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 20235 2022-05-18T04:31:08.4993577Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 20236 2022-05-18T04:31:08.5102503Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 20237 2022-05-18T04:31:08.5209241Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 20238 2022-05-18T04:31:09.3923749Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmph9hsjh5h 2022-05-18T04:31:09.3924731Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmph9hsjh5h/_remote_module_non_scriptable.py 2022-05-18T04:31:09.3962524Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp3an7vpyw 2022-05-18T04:31:09.3965104Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp3an7vpyw/_remote_module_non_scriptable.py 2022-05-18T04:31:09.4036992Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpoqo5nwn6 2022-05-18T04:31:09.4039750Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpoqo5nwn6/_remote_module_non_scriptable.py 2022-05-18T04:31:09.4044036Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpfvgy0l4u 2022-05-18T04:31:09.4046885Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpfvgy0l4u/_remote_module_non_scriptable.py 2022-05-18T04:31:09.7490932Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:31:09.7514511Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:31:09.7648022Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:31:09.7665219Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:31:12.0323538Z ok (5.268s) 2022-05-18T04:31:12.0323886Z 2022-05-18T04:31:12.0324510Z ---------------------------------------------------------------------- 2022-05-18T04:31:12.0324878Z Ran 1 test in 5.269s 2022-05-18T04:31:12.0325075Z 2022-05-18T04:31:12.0325152Z OK 2022-05-18T04:31:12.0325296Z 2022-05-18T04:31:12.0325437Z Generating XML reports... 2022-05-18T04:31:12.0370244Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518043106.xml 2022-05-18T04:31:13.2090909Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp9_bs45eb 2022-05-18T04:31:13.2092097Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp9_bs45eb/_remote_module_non_scriptable.py 2022-05-18T04:31:13.5659511Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:31:13.5674097Z 2022-05-18T04:31:13.5674376Z Running tests... 2022-05-18T04:31:13.5674810Z ---------------------------------------------------------------------- 2022-05-18T04:31:15.1847852Z test_device_maps_missing_config_remote_response (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:31:15.2470109Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 20588 2022-05-18T04:31:15.2574023Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 20589 2022-05-18T04:31:15.2680115Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 20590 2022-05-18T04:31:15.2788550Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 20591 2022-05-18T04:31:16.1634949Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpnovgm2sv 2022-05-18T04:31:16.1636147Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpnovgm2sv/_remote_module_non_scriptable.py 2022-05-18T04:31:16.1908830Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpji5rj1q7 2022-05-18T04:31:16.1911612Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpji5rj1q7/_remote_module_non_scriptable.py 2022-05-18T04:31:16.2114387Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmplr1h7xun 2022-05-18T04:31:16.2116777Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmplr1h7xun/_remote_module_non_scriptable.py 2022-05-18T04:31:16.2229351Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp40z212tc 2022-05-18T04:31:16.2231842Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp40z212tc/_remote_module_non_scriptable.py 2022-05-18T04:31:16.5238260Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:31:16.5456967Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:31:16.5661722Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:31:16.5802642Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:31:18.7892615Z ok (5.221s) 2022-05-18T04:31:18.7892848Z 2022-05-18T04:31:18.7893317Z ---------------------------------------------------------------------- 2022-05-18T04:31:18.7893874Z Ran 1 test in 5.222s 2022-05-18T04:31:18.7894047Z 2022-05-18T04:31:18.7894130Z OK 2022-05-18T04:31:18.7894270Z 2022-05-18T04:31:18.7894407Z Generating XML reports... 2022-05-18T04:31:18.7936622Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518043113.xml 2022-05-18T04:31:19.9790827Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmput891dps 2022-05-18T04:31:19.9791840Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmput891dps/_remote_module_non_scriptable.py 2022-05-18T04:31:20.3499239Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:31:20.3514921Z 2022-05-18T04:31:20.3515357Z Running tests... 2022-05-18T04:31:20.3516068Z ---------------------------------------------------------------------- 2022-05-18T04:31:21.9979866Z test_device_maps_missing_config_response (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:31:22.0623784Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 20941 2022-05-18T04:31:22.0727335Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 20942 2022-05-18T04:31:22.0836618Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 20943 2022-05-18T04:31:22.0944362Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 20944 2022-05-18T04:31:22.9875207Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp9773vj_n 2022-05-18T04:31:22.9876524Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp9773vj_n/_remote_module_non_scriptable.py 2022-05-18T04:31:22.9942710Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp1uy4jv56 2022-05-18T04:31:22.9945566Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp1uy4jv56/_remote_module_non_scriptable.py 2022-05-18T04:31:23.0034806Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpx4s35qla 2022-05-18T04:31:23.0037484Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpx4s35qla/_remote_module_non_scriptable.py 2022-05-18T04:31:23.0108913Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp3z43qnrj 2022-05-18T04:31:23.0111651Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp3z43qnrj/_remote_module_non_scriptable.py 2022-05-18T04:31:23.3450266Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:31:23.3570694Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:31:23.3632190Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:31:23.3653541Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:31:25.6062576Z ok (5.254s) 2022-05-18T04:31:25.6062965Z 2022-05-18T04:31:25.6063450Z ---------------------------------------------------------------------- 2022-05-18T04:31:25.6063790Z Ran 1 test in 5.255s 2022-05-18T04:31:25.6063966Z 2022-05-18T04:31:25.6064063Z OK 2022-05-18T04:31:25.6064200Z 2022-05-18T04:31:25.6064340Z Generating XML reports... 2022-05-18T04:31:25.6110102Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518043120.xml 2022-05-18T04:31:26.7940839Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmps5tf00ec 2022-05-18T04:31:26.7942028Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmps5tf00ec/_remote_module_non_scriptable.py 2022-05-18T04:31:27.1624584Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:31:27.1640836Z 2022-05-18T04:31:27.1641171Z Running tests... 2022-05-18T04:31:27.1641695Z ---------------------------------------------------------------------- 2022-05-18T04:31:28.8007333Z test_device_maps_missing_config_response_loop (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:31:28.8649741Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 21294 2022-05-18T04:31:28.8753450Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 21295 2022-05-18T04:31:28.8863279Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 21296 2022-05-18T04:31:28.8972533Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 21297 2022-05-18T04:31:29.7712409Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpa2g4z7gb 2022-05-18T04:31:29.7713347Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpa2g4z7gb/_remote_module_non_scriptable.py 2022-05-18T04:31:29.8082258Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpl3vtf_z3 2022-05-18T04:31:29.8085075Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpl3vtf_z3/_remote_module_non_scriptable.py 2022-05-18T04:31:29.8139851Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp8ye896e5 2022-05-18T04:31:29.8142775Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp8ye896e5/_remote_module_non_scriptable.py 2022-05-18T04:31:29.8298231Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpdckxxf71 2022-05-18T04:31:29.8301586Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpdckxxf71/_remote_module_non_scriptable.py 2022-05-18T04:31:30.1288653Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:31:30.1738117Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:31:30.1763316Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:31:30.2062525Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:31:32.5088196Z ok (5.344s) 2022-05-18T04:31:32.5088415Z 2022-05-18T04:31:32.5088819Z ---------------------------------------------------------------------- 2022-05-18T04:31:32.5089183Z Ran 1 test in 5.345s 2022-05-18T04:31:32.5089332Z 2022-05-18T04:31:32.5089430Z OK 2022-05-18T04:31:32.5089571Z 2022-05-18T04:31:32.5090016Z Generating XML reports... 2022-05-18T04:31:32.5133150Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518043127.xml 2022-05-18T04:31:33.6786972Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmptwl1sqb7 2022-05-18T04:31:33.6788305Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmptwl1sqb7/_remote_module_non_scriptable.py 2022-05-18T04:31:34.0461052Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:31:34.0476292Z 2022-05-18T04:31:34.0476439Z Running tests... 2022-05-18T04:31:34.0477158Z ---------------------------------------------------------------------- 2022-05-18T04:31:35.6836455Z test_device_maps_multi_gpu (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:31:35.7475407Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 21647 2022-05-18T04:31:35.7580442Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 21648 2022-05-18T04:31:35.7689035Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 21649 2022-05-18T04:31:35.7799352Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 21650 2022-05-18T04:31:36.6457413Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpmxedt9xz 2022-05-18T04:31:36.6458042Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpmxedt9xz/_remote_module_non_scriptable.py 2022-05-18T04:31:36.6590820Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpgwbdnmk7 2022-05-18T04:31:36.6594097Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpgwbdnmk7/_remote_module_non_scriptable.py 2022-05-18T04:31:36.6610965Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpbgxh4v5f 2022-05-18T04:31:36.6614105Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpbgxh4v5f/_remote_module_non_scriptable.py 2022-05-18T04:31:36.6635646Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpj344byl2 2022-05-18T04:31:36.6638486Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpj344byl2/_remote_module_non_scriptable.py 2022-05-18T04:31:37.0055217Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:31:37.0138327Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:31:37.0150470Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:31:37.0285549Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:31:42.5982688Z ok (8.550s) 2022-05-18T04:31:42.5982945Z 2022-05-18T04:31:42.5983367Z ---------------------------------------------------------------------- 2022-05-18T04:31:42.5983737Z Ran 1 test in 8.551s 2022-05-18T04:31:42.5983904Z 2022-05-18T04:31:42.5984001Z OK 2022-05-18T04:31:42.5984119Z 2022-05-18T04:31:42.5984256Z Generating XML reports... 2022-05-18T04:31:42.6027258Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518043134.xml 2022-05-18T04:31:43.7735529Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpe2lxi4ky 2022-05-18T04:31:43.7736765Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpe2lxi4ky/_remote_module_non_scriptable.py 2022-05-18T04:31:44.1453566Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:31:44.1469120Z 2022-05-18T04:31:44.1469261Z Running tests... 2022-05-18T04:31:44.1469996Z ---------------------------------------------------------------------- 2022-05-18T04:31:45.8040753Z test_device_maps_multi_gpu_self (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:31:45.8670287Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 22012 2022-05-18T04:31:45.8776584Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 22013 2022-05-18T04:31:45.8882292Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 22014 2022-05-18T04:31:45.8991008Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 22015 2022-05-18T04:31:46.7871099Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpvh1k4okf 2022-05-18T04:31:46.7872217Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpvh1k4okf/_remote_module_non_scriptable.py 2022-05-18T04:31:46.7875796Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpwrnvbq0q 2022-05-18T04:31:46.7879584Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpwrnvbq0q/_remote_module_non_scriptable.py 2022-05-18T04:31:46.7900669Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmps8lfcj04 2022-05-18T04:31:46.7903049Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmps8lfcj04/_remote_module_non_scriptable.py 2022-05-18T04:31:46.8325683Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpsmlco5j1 2022-05-18T04:31:46.8327552Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpsmlco5j1/_remote_module_non_scriptable.py 2022-05-18T04:31:47.1431359Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:31:47.1468322Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:31:47.1627914Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:31:47.1946181Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:31:52.9174599Z ok (8.770s) 2022-05-18T04:31:52.9174862Z 2022-05-18T04:31:52.9175305Z ---------------------------------------------------------------------- 2022-05-18T04:31:52.9175662Z Ran 1 test in 8.770s 2022-05-18T04:31:52.9175811Z 2022-05-18T04:31:52.9175911Z OK 2022-05-18T04:31:52.9176050Z 2022-05-18T04:31:52.9176191Z Generating XML reports... 2022-05-18T04:31:52.9220365Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518043144.xml 2022-05-18T04:31:54.0856298Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp8jnincmz 2022-05-18T04:31:54.0857138Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp8jnincmz/_remote_module_non_scriptable.py 2022-05-18T04:31:54.4512323Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:31:54.4527337Z 2022-05-18T04:31:54.4527768Z Running tests... 2022-05-18T04:31:54.4528282Z ---------------------------------------------------------------------- 2022-05-18T04:31:56.0828589Z test_device_maps_one_to_many (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:31:56.1476355Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 22369 2022-05-18T04:31:56.1581652Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 22370 2022-05-18T04:31:56.1689933Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 22371 2022-05-18T04:31:56.1798212Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 22372 2022-05-18T04:31:57.0466675Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpnczwt9a9 2022-05-18T04:31:57.0467311Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpnczwt9a9/_remote_module_non_scriptable.py 2022-05-18T04:31:57.0616892Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpxom2jii2 2022-05-18T04:31:57.0620320Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpxom2jii2/_remote_module_non_scriptable.py 2022-05-18T04:31:57.0637574Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpr03omaf4 2022-05-18T04:31:57.0640590Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpr03omaf4/_remote_module_non_scriptable.py 2022-05-18T04:31:57.1240914Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpjdqvi9fl 2022-05-18T04:31:57.1243375Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpjdqvi9fl/_remote_module_non_scriptable.py 2022-05-18T04:31:57.4059297Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:31:57.4259835Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:31:57.4265353Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:31:57.4844775Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:31:57.6856527Z ok (3.233s) 2022-05-18T04:31:57.6856703Z 2022-05-18T04:31:57.6857109Z ---------------------------------------------------------------------- 2022-05-18T04:31:57.6857469Z Ran 1 test in 3.233s 2022-05-18T04:31:57.6857640Z 2022-05-18T04:31:57.6857744Z OK 2022-05-18T04:31:57.6857885Z 2022-05-18T04:31:57.6858000Z Generating XML reports... 2022-05-18T04:31:57.6901992Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518043154.xml 2022-05-18T04:31:58.8453838Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp1283oan8 2022-05-18T04:31:58.8455108Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp1283oan8/_remote_module_non_scriptable.py 2022-05-18T04:31:59.2044371Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:31:59.2059358Z 2022-05-18T04:31:59.2059865Z Running tests... 2022-05-18T04:31:59.2060386Z ---------------------------------------------------------------------- 2022-05-18T04:32:00.8195080Z test_device_maps_remote (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:32:00.8830101Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 22550 2022-05-18T04:32:00.8933256Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 22551 2022-05-18T04:32:00.9040895Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 22552 2022-05-18T04:32:00.9147400Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 22553 2022-05-18T04:32:01.8275626Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpuviekfzt 2022-05-18T04:32:01.8276674Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpuviekfzt/_remote_module_non_scriptable.py 2022-05-18T04:32:01.8317109Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpx9p_h20f 2022-05-18T04:32:01.8319633Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpx9p_h20f/_remote_module_non_scriptable.py 2022-05-18T04:32:01.8817217Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpaaedwyhz 2022-05-18T04:32:01.8818377Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp_8qupa_6 2022-05-18T04:32:01.8818922Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpaaedwyhz/_remote_module_non_scriptable.py 2022-05-18T04:32:01.8821641Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp_8qupa_6/_remote_module_non_scriptable.py 2022-05-18T04:32:02.1822757Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:32:02.1894493Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:32:02.2348976Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:32:02.2522996Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:32:07.9353925Z ok (8.729s) 2022-05-18T04:32:07.9354299Z 2022-05-18T04:32:07.9354915Z ---------------------------------------------------------------------- 2022-05-18T04:32:07.9355293Z Ran 1 test in 8.729s 2022-05-18T04:32:07.9355442Z 2022-05-18T04:32:07.9355541Z OK 2022-05-18T04:32:07.9355682Z 2022-05-18T04:32:07.9355823Z Generating XML reports... 2022-05-18T04:32:07.9399381Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518043159.xml 2022-05-18T04:32:09.1051887Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpd799u_8b 2022-05-18T04:32:09.1053465Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpd799u_8b/_remote_module_non_scriptable.py 2022-05-18T04:32:09.4777572Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:32:09.4792579Z 2022-05-18T04:32:09.4792867Z Running tests... 2022-05-18T04:32:09.4793309Z ---------------------------------------------------------------------- 2022-05-18T04:32:11.1424732Z test_device_maps_return_to_gpu (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:32:11.2080808Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 22915 2022-05-18T04:32:11.2188194Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 22916 2022-05-18T04:32:11.2297902Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 22917 2022-05-18T04:32:11.2406468Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 22918 2022-05-18T04:32:12.1580283Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpwn0utpo6 2022-05-18T04:32:12.1580953Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpwn0utpo6/_remote_module_non_scriptable.py 2022-05-18T04:32:12.1991071Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp0gql6j1r 2022-05-18T04:32:12.1991656Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpe36vet8k 2022-05-18T04:32:12.1994100Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpe36vet8k/_remote_module_non_scriptable.py 2022-05-18T04:32:12.1994670Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp0gql6j1r/_remote_module_non_scriptable.py 2022-05-18T04:32:12.2069917Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmptzgth0q2 2022-05-18T04:32:12.2072665Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmptzgth0q2/_remote_module_non_scriptable.py 2022-05-18T04:32:12.5173846Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:32:12.5570031Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:32:12.5596179Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:32:12.5696971Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:32:12.7462022Z skip: Need at least 4 CUDA devices (3.267s) 2022-05-18T04:32:12.7462288Z 2022-05-18T04:32:12.7462702Z ---------------------------------------------------------------------- 2022-05-18T04:32:12.7463052Z Ran 1 test in 3.267s 2022-05-18T04:32:12.7463222Z 2022-05-18T04:32:12.7463316Z OK (skipped=1) 2022-05-18T04:32:12.7463804Z 2022-05-18T04:32:12.7464347Z Generating XML reports... 2022-05-18T04:32:12.7507380Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518043209.xml 2022-05-18T04:32:13.9182676Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmptxy9s_pl 2022-05-18T04:32:13.9184056Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmptxy9s_pl/_remote_module_non_scriptable.py 2022-05-18T04:32:14.2877300Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:32:14.2892732Z 2022-05-18T04:32:14.2892980Z Running tests... 2022-05-18T04:32:14.2893428Z ---------------------------------------------------------------------- 2022-05-18T04:32:15.9226959Z test_device_maps_return_to_gpu_self (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:32:15.9873030Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 23096 2022-05-18T04:32:15.9978875Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 23097 2022-05-18T04:32:16.0086345Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 23098 2022-05-18T04:32:16.0195125Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 23099 2022-05-18T04:32:16.9075737Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpitbd76a3 2022-05-18T04:32:16.9076655Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpitbd76a3/_remote_module_non_scriptable.py 2022-05-18T04:32:16.9079196Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpidmu96o9 2022-05-18T04:32:16.9081878Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpidmu96o9/_remote_module_non_scriptable.py 2022-05-18T04:32:16.9436813Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpa3s8ugrf 2022-05-18T04:32:16.9439057Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpa3s8ugrf/_remote_module_non_scriptable.py 2022-05-18T04:32:16.9523398Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp8f9x7_qi 2022-05-18T04:32:16.9526345Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp8f9x7_qi/_remote_module_non_scriptable.py 2022-05-18T04:32:17.2622432Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:32:17.2645218Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:32:17.2981626Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:32:17.3105647Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:32:17.5249193Z skip: Need at least 4 CUDA devices (3.235s) 2022-05-18T04:32:17.5249451Z 2022-05-18T04:32:17.5249842Z ---------------------------------------------------------------------- 2022-05-18T04:32:17.5250532Z Ran 1 test in 3.236s 2022-05-18T04:32:17.5250709Z 2022-05-18T04:32:17.5250837Z OK (skipped=1) 2022-05-18T04:32:17.5250998Z 2022-05-18T04:32:17.5251129Z Generating XML reports... 2022-05-18T04:32:17.5295966Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518043214.xml 2022-05-18T04:32:18.6982846Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp5_21p_kw 2022-05-18T04:32:18.6984915Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp5_21p_kw/_remote_module_non_scriptable.py 2022-05-18T04:32:19.0703724Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:32:19.0719787Z 2022-05-18T04:32:19.0720047Z Running tests... 2022-05-18T04:32:19.0720491Z ---------------------------------------------------------------------- 2022-05-18T04:32:20.7230369Z test_device_maps_wrong_worker_name (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:32:20.7896711Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 23277 2022-05-18T04:32:20.8000485Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 23278 2022-05-18T04:32:20.8108664Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 23279 2022-05-18T04:32:20.8217965Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 23280 2022-05-18T04:32:21.7234949Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpwoggqkp_ 2022-05-18T04:32:21.7235878Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpwoggqkp_/_remote_module_non_scriptable.py 2022-05-18T04:32:21.7269719Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpdoxfen8y 2022-05-18T04:32:21.7272737Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpdoxfen8y/_remote_module_non_scriptable.py 2022-05-18T04:32:21.7479200Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp5z9hv_qi 2022-05-18T04:32:21.7481451Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp5z9hv_qi/_remote_module_non_scriptable.py 2022-05-18T04:32:21.7989601Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp9ze7we01 2022-05-18T04:32:21.7992244Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp9ze7we01/_remote_module_non_scriptable.py 2022-05-18T04:32:22.0796103Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:32:22.1023529Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:32:22.1050429Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:32:22.1580137Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:32:22.4276408Z ok (3.355s) 2022-05-18T04:32:22.4276638Z 2022-05-18T04:32:22.4277158Z ---------------------------------------------------------------------- 2022-05-18T04:32:22.4277491Z Ran 1 test in 3.356s 2022-05-18T04:32:22.4277671Z 2022-05-18T04:32:22.4277773Z OK 2022-05-18T04:32:22.4277916Z 2022-05-18T04:32:22.4278064Z Generating XML reports... 2022-05-18T04:32:22.4323100Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518043219.xml 2022-05-18T04:32:23.6054210Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpreb1snyv 2022-05-18T04:32:23.6055615Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpreb1snyv/_remote_module_non_scriptable.py 2022-05-18T04:32:23.9730389Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:32:23.9746254Z 2022-05-18T04:32:23.9746497Z Running tests... 2022-05-18T04:32:23.9746935Z ---------------------------------------------------------------------- 2022-05-18T04:32:25.6382802Z test_device_mismatch (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:32:25.7049270Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 23470 2022-05-18T04:32:25.7155241Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 23471 2022-05-18T04:32:25.7264229Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 23472 2022-05-18T04:32:25.7373341Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 23473 2022-05-18T04:32:26.6204419Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpfg5nkz__ 2022-05-18T04:32:26.6205646Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpfg5nkz__/_remote_module_non_scriptable.py 2022-05-18T04:32:26.6577097Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmph3099b0l 2022-05-18T04:32:26.6579605Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmph3099b0l/_remote_module_non_scriptable.py 2022-05-18T04:32:26.6592393Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpt8nhpcle 2022-05-18T04:32:26.6595052Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpt8nhpcle/_remote_module_non_scriptable.py 2022-05-18T04:32:26.6644899Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmphzr7ltlk 2022-05-18T04:32:26.6647232Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmphzr7ltlk/_remote_module_non_scriptable.py 2022-05-18T04:32:26.9925053Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:32:27.0107831Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:32:27.0144808Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:32:27.0337172Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:32:29.9955409Z On WorkerInfo(id=1, name=worker1): 2022-05-18T04:32:29.9968243Z RuntimeError('Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!\nException raised from compute_types at /var/lib/jenkins/workspace/aten/src/ATen/TensorIterator.cpp:484 (most recent call first):\nframe #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) + 0x6b (0x7f7dfa9e91eb in /opt/conda/lib/python3.7/site-packages/torch/lib/libc10.so)\nframe #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string, std::allocator > const&) + 0xce (0x7f7dfa9e4bbe in /opt/conda/lib/python3.7/site-packages/torch/lib/libc10.so)\nframe #2: at::TensorIteratorBase::compute_types(at::TensorIteratorConfig const&) + 0xc2b (0x7f7e048466db in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #3: at::TensorIteratorBase::build(at::TensorIteratorConfig&) + 0x7f (0x7f7e04848b1f in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #4: at::TensorIteratorBase::build_borrowing_binary_op(at::TensorBase const&, at::TensorBase const&, at::TensorBase const&) + 0xf7 (0x7f7e0484a2e7 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #5: at::meta::structured_add_Tensor::meta(at::Tensor const&, at::Tensor const&, c10::Scalar const&) + 0x2f (0x7f7e04a17faf in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #6: + 0x2a45b96 (0x7f7dfd66db96 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cuda_cu.so)\nframe #7: + 0x2a45cb6 (0x7f7dfd66dcb6 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cuda_cu.so)\nframe #8: at::_ops::add_Tensor::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, c10::Scalar const&) + 0x98 (0x7f7e052ba6c8 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #9: + 0x2b91b65 (0x7f7e06657b65 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #10: + 0x2b922f9 (0x7f7e066582f9 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #11: at::_ops::add_Tensor::call(at::Tensor const&, at::Tensor const&, c10::Scalar const&) + 0x173 (0x7f7e052e5d33 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #12: + 0x2c1ec7 (0x7f7e11392ec7 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)\nframe #13: + 0x2c2206 (0x7f7e11393206 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)\nframe #14: _PyMethodDef_RawFastCallDict + 0x264 (0x5645377ff3f4 in /opt/conda/bin/python)\nframe #15: _PyObject_FastCallDict + 0x6e (0x5645377d02ee in /opt/conda/bin/python)\nframe #16: + 0x135eb0 (0x5645377ebeb0 in /opt/conda/bin/python)\nframe #17: + 0x1f5a6f (0x5645378aba6f in /opt/conda/bin/python)\nframe #18: PyNumber_Add + 0x41 (0x56453780a0d1 in /opt/conda/bin/python)\nframe #19: _PyEval_EvalFrameDefault + 0xfba (0x564537879f5a in /opt/conda/bin/python)\nframe #20: _PyFunction_FastCallDict + 0x118 (0x5645377edcf8 in /opt/conda/bin/python)\nframe #21: _PyEval_EvalFrameDefault + 0x1cb8 (0x56453787ac58 in /opt/conda/bin/python)\nframe #22: _PyFunction_FastCallDict + 0x118 (0x5645377edcf8 in /opt/conda/bin/python)\nframe #23: + 0x9839ef (0x7f7e11a549ef in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)\nframe #24: torch::distributed::rpc::PythonRpcHandler::runPythonUdf(pybind11::object const&) + 0x7d (0x7f7e11a5339d in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)\nframe #25: torch::distributed::rpc::RequestCallbackImpl::runPythonFunction(pybind11::object const&, std::vector >, bool) const + 0x83 (0x7f7e11a55d83 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)\nframe #26: torch::distributed::rpc::RequestCallbackImpl::processPythonCall(torch::distributed::rpc::RpcCommandBase&, std::vector >) const + 0x96 (0x7f7e11a59dc6 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)\nframe #27: torch::distributed::rpc::RequestCallbackNoPython::processRpc(torch::distributed::rpc::RpcCommandBase&, torch::distributed::rpc::MessageType const&, std::vector >) const + 0x10c (0x7f7e07782a9c in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #28: torch::distributed::rpc::RequestCallbackImpl::processRpcWithErrors(torch::distributed::rpc::RpcCommandBase&, torch::distributed::rpc::MessageType const&, std::vector >) const + 0x65 (0x7f7e11a55b75 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)\nframe #29: + 0x3cb5e23 (0x7f7e0777be23 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #30: torch::distributed::rpc::RequestCallbackNoPython::processMessage(torch::distributed::rpc::Message&, std::vector >) const + 0x538 (0x7f7e0777ca18 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #31: torch::distributed::rpc::RequestCallback::operator()(torch::distributed::rpc::Message&, std::vector >) const + 0x57 (0x7f7e07777097 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #32: + 0x3ce5b22 (0x7f7e077abb22 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #33: c10::ThreadPool::main_loop(unsigned long) + 0x2db (0x7f7dfa9d54bb in /opt/conda/lib/python3.7/site-packages/torch/lib/libc10.so)\nframe #34: + 0xc9039 (0x7f7e1ebdc039 in /opt/conda/lib/libstdc++.so.6)\nframe #35: + 0x76ba (0x7f7e3f50b6ba in /lib/x86_64-linux-gnu/libpthread.so.0)\nframe #36: clone + 0x6d (0x7f7e3f24151d in /lib/x86_64-linux-gnu/libc.so.6)\n') 2022-05-18T04:32:29.9976052Z Traceback (most recent call last): 2022-05-18T04:32:29.9976616Z File "/opt/conda/lib/python3.7/site-packages/torch/distributed/rpc/internal.py", line 206, in _run_function 2022-05-18T04:32:29.9977099Z result = python_udf.func(*python_udf.args, **python_udf.kwargs) 2022-05-18T04:32:29.9977728Z File "/opt/conda/lib/python3.7/site-packages/torch/testing/_internal/distributed/rpc/rpc_test.py", line 6267, in _gpu_add_wrong_gpus 2022-05-18T04:32:29.9978147Z return x.cpu() + y.cuda() 2022-05-18T04:32:29.9978548Z RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! 2022-05-18T04:32:29.9979070Z Exception raised from compute_types at /var/lib/jenkins/workspace/aten/src/ATen/TensorIterator.cpp:484 (most recent call first): 2022-05-18T04:32:29.9979917Z frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) + 0x6b (0x7f7dfa9e91eb in /opt/conda/lib/python3.7/site-packages/torch/lib/libc10.so) 2022-05-18T04:32:29.9981068Z frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string, std::allocator > const&) + 0xce (0x7f7dfa9e4bbe in /opt/conda/lib/python3.7/site-packages/torch/lib/libc10.so) 2022-05-18T04:32:29.9981974Z frame #2: at::TensorIteratorBase::compute_types(at::TensorIteratorConfig const&) + 0xc2b (0x7f7e048466db in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:32:29.9982779Z frame #3: at::TensorIteratorBase::build(at::TensorIteratorConfig&) + 0x7f (0x7f7e04848b1f in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:32:29.9983650Z frame #4: at::TensorIteratorBase::build_borrowing_binary_op(at::TensorBase const&, at::TensorBase const&, at::TensorBase const&) + 0xf7 (0x7f7e0484a2e7 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:32:29.9984556Z frame #5: at::meta::structured_add_Tensor::meta(at::Tensor const&, at::Tensor const&, c10::Scalar const&) + 0x2f (0x7f7e04a17faf in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:32:29.9985286Z frame #6: + 0x2a45b96 (0x7f7dfd66db96 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cuda_cu.so) 2022-05-18T04:32:29.9985943Z frame #7: + 0x2a45cb6 (0x7f7dfd66dcb6 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cuda_cu.so) 2022-05-18T04:32:29.9986769Z frame #8: at::_ops::add_Tensor::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, c10::Scalar const&) + 0x98 (0x7f7e052ba6c8 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:32:29.9987486Z frame #9: + 0x2b91b65 (0x7f7e06657b65 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:32:29.9988125Z frame #10: + 0x2b922f9 (0x7f7e066582f9 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:32:29.9988877Z frame #11: at::_ops::add_Tensor::call(at::Tensor const&, at::Tensor const&, c10::Scalar const&) + 0x173 (0x7f7e052e5d33 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:32:29.9989596Z frame #12: + 0x2c1ec7 (0x7f7e11392ec7 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so) 2022-05-18T04:32:29.9990224Z frame #13: + 0x2c2206 (0x7f7e11393206 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so) 2022-05-18T04:32:29.9990698Z frame #14: _PyMethodDef_RawFastCallDict + 0x264 (0x5645377ff3f4 in /opt/conda/bin/python) 2022-05-18T04:32:29.9991127Z frame #15: _PyObject_FastCallDict + 0x6e (0x5645377d02ee in /opt/conda/bin/python) 2022-05-18T04:32:29.9991542Z frame #16: + 0x135eb0 (0x5645377ebeb0 in /opt/conda/bin/python) 2022-05-18T04:32:29.9991933Z frame #17: + 0x1f5a6f (0x5645378aba6f in /opt/conda/bin/python) 2022-05-18T04:32:29.9992323Z frame #18: PyNumber_Add + 0x41 (0x56453780a0d1 in /opt/conda/bin/python) 2022-05-18T04:32:29.9992731Z frame #19: _PyEval_EvalFrameDefault + 0xfba (0x564537879f5a in /opt/conda/bin/python) 2022-05-18T04:32:29.9993141Z frame #20: _PyFunction_FastCallDict + 0x118 (0x5645377edcf8 in /opt/conda/bin/python) 2022-05-18T04:32:29.9993567Z frame #21: _PyEval_EvalFrameDefault + 0x1cb8 (0x56453787ac58 in /opt/conda/bin/python) 2022-05-18T04:32:29.9993994Z frame #22: _PyFunction_FastCallDict + 0x118 (0x5645377edcf8 in /opt/conda/bin/python) 2022-05-18T04:32:29.9994605Z frame #23: + 0x9839ef (0x7f7e11a549ef in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so) 2022-05-18T04:32:29.9995378Z frame #24: torch::distributed::rpc::PythonRpcHandler::runPythonUdf(pybind11::object const&) + 0x7d (0x7f7e11a5339d in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so) 2022-05-18T04:32:29.9996456Z frame #25: torch::distributed::rpc::RequestCallbackImpl::runPythonFunction(pybind11::object const&, std::vector >, bool) const + 0x83 (0x7f7e11a55d83 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so) 2022-05-18T04:32:29.9997621Z frame #26: torch::distributed::rpc::RequestCallbackImpl::processPythonCall(torch::distributed::rpc::RpcCommandBase&, std::vector >) const + 0x96 (0x7f7e11a59dc6 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so) 2022-05-18T04:32:29.9998843Z frame #27: torch::distributed::rpc::RequestCallbackNoPython::processRpc(torch::distributed::rpc::RpcCommandBase&, torch::distributed::rpc::MessageType const&, std::vector >) const + 0x10c (0x7f7e07782a9c in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:32:30.0000111Z frame #28: torch::distributed::rpc::RequestCallbackImpl::processRpcWithErrors(torch::distributed::rpc::RpcCommandBase&, torch::distributed::rpc::MessageType const&, std::vector >) const + 0x65 (0x7f7e11a55b75 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so) 2022-05-18T04:32:30.0000999Z frame #29: + 0x3cb5e23 (0x7f7e0777be23 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:32:30.0001918Z frame #30: torch::distributed::rpc::RequestCallbackNoPython::processMessage(torch::distributed::rpc::Message&, std::vector >) const + 0x538 (0x7f7e0777ca18 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:32:30.0002976Z frame #31: torch::distributed::rpc::RequestCallback::operator()(torch::distributed::rpc::Message&, std::vector >) const + 0x57 (0x7f7e07777097 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:32:30.0003766Z frame #32: + 0x3ce5b22 (0x7f7e077abb22 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:32:30.0004452Z frame #33: c10::ThreadPool::main_loop(unsigned long) + 0x2db (0x7f7dfa9d54bb in /opt/conda/lib/python3.7/site-packages/torch/lib/libc10.so) 2022-05-18T04:32:30.0004936Z frame #34: + 0xc9039 (0x7f7e1ebdc039 in /opt/conda/lib/libstdc++.so.6) 2022-05-18T04:32:30.0005485Z frame #35: + 0x76ba (0x7f7e3f50b6ba in /lib/x86_64-linux-gnu/libpthread.so.0) 2022-05-18T04:32:30.0005988Z frame #36: clone + 0x6d (0x7f7e3f24151d in /lib/x86_64-linux-gnu/libc.so.6) 2022-05-18T04:32:30.0006214Z 2022-05-18T04:32:30.0006234Z 2022-05-18T04:32:30.0159111Z On WorkerInfo(id=0, name=worker0): 2022-05-18T04:32:30.0173346Z RuntimeError('Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!\nException raised from compute_types at /var/lib/jenkins/workspace/aten/src/ATen/TensorIterator.cpp:484 (most recent call first):\nframe #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) + 0x6b (0x7f7c809d31eb in /opt/conda/lib/python3.7/site-packages/torch/lib/libc10.so)\nframe #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string, std::allocator > const&) + 0xce (0x7f7c809cebbe in /opt/conda/lib/python3.7/site-packages/torch/lib/libc10.so)\nframe #2: at::TensorIteratorBase::compute_types(at::TensorIteratorConfig const&) + 0xc2b (0x7f7c8a8306db in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #3: at::TensorIteratorBase::build(at::TensorIteratorConfig&) + 0x7f (0x7f7c8a832b1f in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #4: at::TensorIteratorBase::build_borrowing_binary_op(at::TensorBase const&, at::TensorBase const&, at::TensorBase const&) + 0xf7 (0x7f7c8a8342e7 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #5: at::meta::structured_add_Tensor::meta(at::Tensor const&, at::Tensor const&, c10::Scalar const&) + 0x2f (0x7f7c8aa01faf in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #6: + 0x2a45b96 (0x7f7c83657b96 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cuda_cu.so)\nframe #7: + 0x2a45cb6 (0x7f7c83657cb6 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cuda_cu.so)\nframe #8: at::_ops::add_Tensor::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, c10::Scalar const&) + 0x98 (0x7f7c8b2a46c8 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #9: + 0x2b91b65 (0x7f7c8c641b65 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #10: + 0x2b922f9 (0x7f7c8c6422f9 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #11: at::_ops::add_Tensor::call(at::Tensor const&, at::Tensor const&, c10::Scalar const&) + 0x173 (0x7f7c8b2cfd33 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #12: + 0x2c1ec7 (0x7f7c9737cec7 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)\nframe #13: + 0x2c2206 (0x7f7c9737d206 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)\nframe #14: _PyMethodDef_RawFastCallDict + 0x264 (0x55bf373ea3f4 in /opt/conda/bin/python)\nframe #15: _PyObject_FastCallDict + 0x6e (0x55bf373bb2ee in /opt/conda/bin/python)\nframe #16: + 0x135eb0 (0x55bf373d6eb0 in /opt/conda/bin/python)\nframe #17: + 0x1f5a6f (0x55bf37496a6f in /opt/conda/bin/python)\nframe #18: PyNumber_Add + 0x41 (0x55bf373f50d1 in /opt/conda/bin/python)\nframe #19: _PyEval_EvalFrameDefault + 0xfba (0x55bf37464f5a in /opt/conda/bin/python)\nframe #20: _PyFunction_FastCallDict + 0x118 (0x55bf373d8cf8 in /opt/conda/bin/python)\nframe #21: _PyEval_EvalFrameDefault + 0x1cb8 (0x55bf37465c58 in /opt/conda/bin/python)\nframe #22: _PyFunction_FastCallDict + 0x118 (0x55bf373d8cf8 in /opt/conda/bin/python)\nframe #23: + 0x9839ef (0x7f7c97a3e9ef in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)\nframe #24: torch::distributed::rpc::PythonRpcHandler::runPythonUdf(pybind11::object const&) + 0x7d (0x7f7c97a3d39d in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)\nframe #25: torch::distributed::rpc::RequestCallbackImpl::runPythonFunction(pybind11::object const&, std::vector >, bool) const + 0x83 (0x7f7c97a3fd83 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)\nframe #26: torch::distributed::rpc::RequestCallbackImpl::processPythonCall(torch::distributed::rpc::RpcCommandBase&, std::vector >) const + 0x96 (0x7f7c97a43dc6 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)\nframe #27: torch::distributed::rpc::RequestCallbackNoPython::processRpc(torch::distributed::rpc::RpcCommandBase&, torch::distributed::rpc::MessageType const&, std::vector >) const + 0x10c (0x7f7c8d76ca9c in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #28: torch::distributed::rpc::RequestCallbackImpl::processRpcWithErrors(torch::distributed::rpc::RpcCommandBase&, torch::distributed::rpc::MessageType const&, std::vector >) const + 0x65 (0x7f7c97a3fb75 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)\nframe #29: + 0x3cb5e23 (0x7f7c8d765e23 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #30: torch::distributed::rpc::RequestCallbackNoPython::processMessage(torch::distributed::rpc::Message&, std::vector >) const + 0x538 (0x7f7c8d766a18 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #31: torch::distributed::rpc::RequestCallback::operator()(torch::distributed::rpc::Message&, std::vector >) const + 0x57 (0x7f7c8d761097 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #32: + 0x3ce5b22 (0x7f7c8d795b22 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #33: c10::ThreadPool::main_loop(unsigned long) + 0x2db (0x7f7c809bf4bb in /opt/conda/lib/python3.7/site-packages/torch/lib/libc10.so)\nframe #34: + 0xc9039 (0x7f7ca4bc6039 in /opt/conda/lib/libstdc++.so.6)\nframe #35: + 0x76ba (0x7f7cc54f56ba in /lib/x86_64-linux-gnu/libpthread.so.0)\nframe #36: clone + 0x6d (0x7f7cc522b51d in /lib/x86_64-linux-gnu/libc.so.6)\n') 2022-05-18T04:32:30.0180756Z Traceback (most recent call last): 2022-05-18T04:32:30.0181312Z File "/opt/conda/lib/python3.7/site-packages/torch/distributed/rpc/internal.py", line 206, in _run_function 2022-05-18T04:32:30.0181811Z result = python_udf.func(*python_udf.args, **python_udf.kwargs) 2022-05-18T04:32:30.0182431Z File "/opt/conda/lib/python3.7/site-packages/torch/testing/_internal/distributed/rpc/rpc_test.py", line 6267, in _gpu_add_wrong_gpus 2022-05-18T04:32:30.0182830Z return x.cpu() + y.cuda() 2022-05-18T04:32:30.0183238Z RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! 2022-05-18T04:32:30.0184070Z Exception raised from compute_types at /var/lib/jenkins/workspace/aten/src/ATen/TensorIterator.cpp:484 (most recent call first): 2022-05-18T04:32:30.0185672Z frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) + 0x6b (0x7f7c809d31eb in /opt/conda/lib/python3.7/site-packages/torch/lib/libc10.so) 2022-05-18T04:32:30.0187483Z frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string, std::allocator > const&) + 0xce (0x7f7c809cebbe in /opt/conda/lib/python3.7/site-packages/torch/lib/libc10.so) 2022-05-18T04:32:30.0188792Z frame #2: at::TensorIteratorBase::compute_types(at::TensorIteratorConfig const&) + 0xc2b (0x7f7c8a8306db in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:32:30.0189611Z frame #3: at::TensorIteratorBase::build(at::TensorIteratorConfig&) + 0x7f (0x7f7c8a832b1f in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:32:30.0190512Z frame #4: at::TensorIteratorBase::build_borrowing_binary_op(at::TensorBase const&, at::TensorBase const&, at::TensorBase const&) + 0xf7 (0x7f7c8a8342e7 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:32:30.0191406Z frame #5: at::meta::structured_add_Tensor::meta(at::Tensor const&, at::Tensor const&, c10::Scalar const&) + 0x2f (0x7f7c8aa01faf in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:32:30.0192117Z frame #6: + 0x2a45b96 (0x7f7c83657b96 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cuda_cu.so) 2022-05-18T04:32:30.0192775Z frame #7: + 0x2a45cb6 (0x7f7c83657cb6 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cuda_cu.so) 2022-05-18T04:32:30.0193593Z frame #8: at::_ops::add_Tensor::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, c10::Scalar const&) + 0x98 (0x7f7c8b2a46c8 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:32:30.0194331Z frame #9: + 0x2b91b65 (0x7f7c8c641b65 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:32:30.0195340Z frame #10: + 0x2b922f9 (0x7f7c8c6422f9 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:32:30.0196114Z frame #11: at::_ops::add_Tensor::call(at::Tensor const&, at::Tensor const&, c10::Scalar const&) + 0x173 (0x7f7c8b2cfd33 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:32:30.0197026Z frame #12: + 0x2c1ec7 (0x7f7c9737cec7 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so) 2022-05-18T04:32:30.0197723Z frame #13: + 0x2c2206 (0x7f7c9737d206 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so) 2022-05-18T04:32:30.0198238Z frame #14: _PyMethodDef_RawFastCallDict + 0x264 (0x55bf373ea3f4 in /opt/conda/bin/python) 2022-05-18T04:32:30.0198746Z frame #15: _PyObject_FastCallDict + 0x6e (0x55bf373bb2ee in /opt/conda/bin/python) 2022-05-18T04:32:30.0199196Z frame #16: + 0x135eb0 (0x55bf373d6eb0 in /opt/conda/bin/python) 2022-05-18T04:32:30.0199628Z frame #17: + 0x1f5a6f (0x55bf37496a6f in /opt/conda/bin/python) 2022-05-18T04:32:30.0200033Z frame #18: PyNumber_Add + 0x41 (0x55bf373f50d1 in /opt/conda/bin/python) 2022-05-18T04:32:30.0200470Z frame #19: _PyEval_EvalFrameDefault + 0xfba (0x55bf37464f5a in /opt/conda/bin/python) 2022-05-18T04:32:30.0200923Z frame #20: _PyFunction_FastCallDict + 0x118 (0x55bf373d8cf8 in /opt/conda/bin/python) 2022-05-18T04:32:30.0201385Z frame #21: _PyEval_EvalFrameDefault + 0x1cb8 (0x55bf37465c58 in /opt/conda/bin/python) 2022-05-18T04:32:30.0201822Z frame #22: _PyFunction_FastCallDict + 0x118 (0x55bf373d8cf8 in /opt/conda/bin/python) 2022-05-18T04:32:30.0202480Z frame #23: + 0x9839ef (0x7f7c97a3e9ef in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so) 2022-05-18T04:32:30.0203321Z frame #24: torch::distributed::rpc::PythonRpcHandler::runPythonUdf(pybind11::object const&) + 0x7d (0x7f7c97a3d39d in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so) 2022-05-18T04:32:30.0204388Z frame #25: torch::distributed::rpc::RequestCallbackImpl::runPythonFunction(pybind11::object const&, std::vector >, bool) const + 0x83 (0x7f7c97a3fd83 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so) 2022-05-18T04:32:30.0205556Z frame #26: torch::distributed::rpc::RequestCallbackImpl::processPythonCall(torch::distributed::rpc::RpcCommandBase&, std::vector >) const + 0x96 (0x7f7c97a43dc6 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so) 2022-05-18T04:32:30.0206852Z frame #27: torch::distributed::rpc::RequestCallbackNoPython::processRpc(torch::distributed::rpc::RpcCommandBase&, torch::distributed::rpc::MessageType const&, std::vector >) const + 0x10c (0x7f7c8d76ca9c in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:32:30.0208206Z frame #28: torch::distributed::rpc::RequestCallbackImpl::processRpcWithErrors(torch::distributed::rpc::RpcCommandBase&, torch::distributed::rpc::MessageType const&, std::vector >) const + 0x65 (0x7f7c97a3fb75 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so) 2022-05-18T04:32:30.0209157Z frame #29: + 0x3cb5e23 (0x7f7c8d765e23 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:32:30.0210151Z frame #30: torch::distributed::rpc::RequestCallbackNoPython::processMessage(torch::distributed::rpc::Message&, std::vector >) const + 0x538 (0x7f7c8d766a18 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:32:30.0211863Z frame #31: torch::distributed::rpc::RequestCallback::operator()(torch::distributed::rpc::Message&, std::vector >) const + 0x57 (0x7f7c8d761097 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:32:30.0212642Z frame #32: + 0x3ce5b22 (0x7f7c8d795b22 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:32:30.0213328Z frame #33: c10::ThreadPool::main_loop(unsigned long) + 0x2db (0x7f7c809bf4bb in /opt/conda/lib/python3.7/site-packages/torch/lib/libc10.so) 2022-05-18T04:32:30.0213828Z frame #34: + 0xc9039 (0x7f7ca4bc6039 in /opt/conda/lib/libstdc++.so.6) 2022-05-18T04:32:30.0214503Z frame #35: + 0x76ba (0x7f7cc54f56ba in /lib/x86_64-linux-gnu/libpthread.so.0) 2022-05-18T04:32:30.0214998Z frame #36: clone + 0x6d (0x7f7cc522b51d in /lib/x86_64-linux-gnu/libc.so.6) 2022-05-18T04:32:30.0215230Z 2022-05-18T04:32:30.0215250Z 2022-05-18T04:32:30.0315370Z On WorkerInfo(id=3, name=worker3): 2022-05-18T04:32:30.0334209Z RuntimeError('Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!\nException raised from compute_types at /var/lib/jenkins/workspace/aten/src/ATen/TensorIterator.cpp:484 (most recent call first):\nframe #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) + 0x6b (0x7f74299d11eb in /opt/conda/lib/python3.7/site-packages/torch/lib/libc10.so)\nframe #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string, std::allocator > const&) + 0xce (0x7f74299ccbbe in /opt/conda/lib/python3.7/site-packages/torch/lib/libc10.so)\nframe #2: at::TensorIteratorBase::compute_types(at::TensorIteratorConfig const&) + 0xc2b (0x7f743382e6db in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #3: at::TensorIteratorBase::build(at::TensorIteratorConfig&) + 0x7f (0x7f7433830b1f in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #4: at::TensorIteratorBase::build_borrowing_binary_op(at::TensorBase const&, at::TensorBase const&, at::TensorBase const&) + 0xf7 (0x7f74338322e7 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #5: at::meta::structured_add_Tensor::meta(at::Tensor const&, at::Tensor const&, c10::Scalar const&) + 0x2f (0x7f74339fffaf in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #6: + 0x2a45b96 (0x7f742c655b96 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cuda_cu.so)\nframe #7: + 0x2a45cb6 (0x7f742c655cb6 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cuda_cu.so)\nframe #8: at::_ops::add_Tensor::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, c10::Scalar const&) + 0x98 (0x7f74342a26c8 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #9: + 0x2b91b65 (0x7f743563fb65 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #10: + 0x2b922f9 (0x7f74356402f9 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #11: at::_ops::add_Tensor::call(at::Tensor const&, at::Tensor const&, c10::Scalar const&) + 0x173 (0x7f74342cdd33 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #12: + 0x2c1ec7 (0x7f744037aec7 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)\nframe #13: + 0x2c2206 (0x7f744037b206 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)\nframe #14: _PyMethodDef_RawFastCallDict + 0x264 (0x557cdd6c53f4 in /opt/conda/bin/python)\nframe #15: _PyObject_FastCallDict + 0x6e (0x557cdd6962ee in /opt/conda/bin/python)\nframe #16: + 0x135eb0 (0x557cdd6b1eb0 in /opt/conda/bin/python)\nframe #17: + 0x1f5a6f (0x557cdd771a6f in /opt/conda/bin/python)\nframe #18: PyNumber_Add + 0x41 (0x557cdd6d00d1 in /opt/conda/bin/python)\nframe #19: _PyEval_EvalFrameDefault + 0xfba (0x557cdd73ff5a in /opt/conda/bin/python)\nframe #20: _PyFunction_FastCallDict + 0x118 (0x557cdd6b3cf8 in /opt/conda/bin/python)\nframe #21: _PyEval_EvalFrameDefault + 0x1cb8 (0x557cdd740c58 in /opt/conda/bin/python)\nframe #22: _PyFunction_FastCallDict + 0x118 (0x557cdd6b3cf8 in /opt/conda/bin/python)\nframe #23: + 0x9839ef (0x7f7440a3c9ef in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)\nframe #24: torch::distributed::rpc::PythonRpcHandler::runPythonUdf(pybind11::object const&) + 0x7d (0x7f7440a3b39d in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)\nframe #25: torch::distributed::rpc::RequestCallbackImpl::runPythonFunction(pybind11::object const&, std::vector >, bool) const + 0x83 (0x7f7440a3dd83 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)\nframe #26: torch::distributed::rpc::RequestCallbackImpl::processPythonCall(torch::distributed::rpc::RpcCommandBase&, std::vector >) const + 0x96 (0x7f7440a41dc6 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)\nframe #27: torch::distributed::rpc::RequestCallbackNoPython::processRpc(torch::distributed::rpc::RpcCommandBase&, torch::distributed::rpc::MessageType const&, std::vector >) const + 0x10c (0x7f743676aa9c in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #28: torch::distributed::rpc::RequestCallbackImpl::processRpcWithErrors(torch::distributed::rpc::RpcCommandBase&, torch::distributed::rpc::MessageType const&, std::vector >) const + 0x65 (0x7f7440a3db75 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)\nframe #29: + 0x3cb5e23 (0x7f7436763e23 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #30: torch::distributed::rpc::RequestCallbackNoPython::processMessage(torch::distributed::rpc::Message&, std::vector >) const + 0x538 (0x7f7436764a18 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #31: torch::distributed::rpc::RequestCallback::operator()(torch::distributed::rpc::Message&, std::vector >) const + 0x57 (0x7f743675f097 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #32: + 0x3ce5b22 (0x7f7436793b22 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #33: c10::ThreadPool::main_loop(unsigned long) + 0x2db (0x7f74299bd4bb in /opt/conda/lib/python3.7/site-packages/torch/lib/libc10.so)\nframe #34: + 0xc9039 (0x7f744dbc4039 in /opt/conda/lib/libstdc++.so.6)\nframe #35: + 0x76ba (0x7f746e4f36ba in /lib/x86_64-linux-gnu/libpthread.so.0)\nframe #36: clone + 0x6d (0x7f746e22951d in /lib/x86_64-linux-gnu/libc.so.6)\n') 2022-05-18T04:32:30.0341640Z Traceback (most recent call last): 2022-05-18T04:32:30.0342239Z File "/opt/conda/lib/python3.7/site-packages/torch/distributed/rpc/internal.py", line 206, in _run_function 2022-05-18T04:32:30.0342699Z result = python_udf.func(*python_udf.args, **python_udf.kwargs) 2022-05-18T04:32:30.0343318Z File "/opt/conda/lib/python3.7/site-packages/torch/testing/_internal/distributed/rpc/rpc_test.py", line 6267, in _gpu_add_wrong_gpus 2022-05-18T04:32:30.0343733Z return x.cpu() + y.cuda() 2022-05-18T04:32:30.0344116Z RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! 2022-05-18T04:32:30.0344661Z Exception raised from compute_types at /var/lib/jenkins/workspace/aten/src/ATen/TensorIterator.cpp:484 (most recent call first): 2022-05-18T04:32:30.0345518Z frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) + 0x6b (0x7f74299d11eb in /opt/conda/lib/python3.7/site-packages/torch/lib/libc10.so) 2022-05-18T04:32:30.0346495Z frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string, std::allocator > const&) + 0xce (0x7f74299ccbbe in /opt/conda/lib/python3.7/site-packages/torch/lib/libc10.so) 2022-05-18T04:32:30.0347388Z frame #2: at::TensorIteratorBase::compute_types(at::TensorIteratorConfig const&) + 0xc2b (0x7f743382e6db in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:32:30.0348165Z frame #3: at::TensorIteratorBase::build(at::TensorIteratorConfig&) + 0x7f (0x7f7433830b1f in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:32:30.0349135Z frame #4: at::TensorIteratorBase::build_borrowing_binary_op(at::TensorBase const&, at::TensorBase const&, at::TensorBase const&) + 0xf7 (0x7f74338322e7 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:32:30.0350070Z frame #5: at::meta::structured_add_Tensor::meta(at::Tensor const&, at::Tensor const&, c10::Scalar const&) + 0x2f (0x7f74339fffaf in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:32:30.0350842Z frame #6: + 0x2a45b96 (0x7f742c655b96 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cuda_cu.so) 2022-05-18T04:32:30.0351477Z frame #7: + 0x2a45cb6 (0x7f742c655cb6 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cuda_cu.so) 2022-05-18T04:32:30.0352294Z frame #8: at::_ops::add_Tensor::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, c10::Scalar const&) + 0x98 (0x7f74342a26c8 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:32:30.0353034Z frame #9: + 0x2b91b65 (0x7f743563fb65 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:32:30.0353666Z frame #10: + 0x2b922f9 (0x7f74356402f9 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:32:30.0354420Z frame #11: at::_ops::add_Tensor::call(at::Tensor const&, at::Tensor const&, c10::Scalar const&) + 0x173 (0x7f74342cdd33 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:32:30.0355120Z frame #12: + 0x2c1ec7 (0x7f744037aec7 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so) 2022-05-18T04:32:30.0355762Z frame #13: + 0x2c2206 (0x7f744037b206 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so) 2022-05-18T04:32:30.0356244Z frame #14: _PyMethodDef_RawFastCallDict + 0x264 (0x557cdd6c53f4 in /opt/conda/bin/python) 2022-05-18T04:32:30.0356665Z frame #15: _PyObject_FastCallDict + 0x6e (0x557cdd6962ee in /opt/conda/bin/python) 2022-05-18T04:32:30.0357086Z frame #16: + 0x135eb0 (0x557cdd6b1eb0 in /opt/conda/bin/python) 2022-05-18T04:32:30.0357491Z frame #17: + 0x1f5a6f (0x557cdd771a6f in /opt/conda/bin/python) 2022-05-18T04:32:30.0357894Z frame #18: PyNumber_Add + 0x41 (0x557cdd6d00d1 in /opt/conda/bin/python) 2022-05-18T04:32:30.0358292Z frame #19: _PyEval_EvalFrameDefault + 0xfba (0x557cdd73ff5a in /opt/conda/bin/python) 2022-05-18T04:32:30.0358725Z frame #20: _PyFunction_FastCallDict + 0x118 (0x557cdd6b3cf8 in /opt/conda/bin/python) 2022-05-18T04:32:30.0359149Z frame #21: _PyEval_EvalFrameDefault + 0x1cb8 (0x557cdd740c58 in /opt/conda/bin/python) 2022-05-18T04:32:30.0359571Z frame #22: _PyFunction_FastCallDict + 0x118 (0x557cdd6b3cf8 in /opt/conda/bin/python) 2022-05-18T04:32:30.0360168Z frame #23: + 0x9839ef (0x7f7440a3c9ef in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so) 2022-05-18T04:32:30.0360957Z frame #24: torch::distributed::rpc::PythonRpcHandler::runPythonUdf(pybind11::object const&) + 0x7d (0x7f7440a3b39d in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so) 2022-05-18T04:32:30.0361963Z frame #25: torch::distributed::rpc::RequestCallbackImpl::runPythonFunction(pybind11::object const&, std::vector >, bool) const + 0x83 (0x7f7440a3dd83 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so) 2022-05-18T04:32:30.0363075Z frame #26: torch::distributed::rpc::RequestCallbackImpl::processPythonCall(torch::distributed::rpc::RpcCommandBase&, std::vector >) const + 0x96 (0x7f7440a41dc6 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so) 2022-05-18T04:32:30.0364279Z frame #27: torch::distributed::rpc::RequestCallbackNoPython::processRpc(torch::distributed::rpc::RpcCommandBase&, torch::distributed::rpc::MessageType const&, std::vector >) const + 0x10c (0x7f743676aa9c in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:32:30.0365647Z frame #28: torch::distributed::rpc::RequestCallbackImpl::processRpcWithErrors(torch::distributed::rpc::RpcCommandBase&, torch::distributed::rpc::MessageType const&, std::vector >) const + 0x65 (0x7f7440a3db75 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so) 2022-05-18T04:32:30.0366534Z frame #29: + 0x3cb5e23 (0x7f7436763e23 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:32:30.0367461Z frame #30: torch::distributed::rpc::RequestCallbackNoPython::processMessage(torch::distributed::rpc::Message&, std::vector >) const + 0x538 (0x7f7436764a18 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:32:30.0368512Z frame #31: torch::distributed::rpc::RequestCallback::operator()(torch::distributed::rpc::Message&, std::vector >) const + 0x57 (0x7f743675f097 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:32:30.0369288Z frame #32: + 0x3ce5b22 (0x7f7436793b22 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:32:30.0369957Z frame #33: c10::ThreadPool::main_loop(unsigned long) + 0x2db (0x7f74299bd4bb in /opt/conda/lib/python3.7/site-packages/torch/lib/libc10.so) 2022-05-18T04:32:30.0371096Z frame #34: + 0xc9039 (0x7f744dbc4039 in /opt/conda/lib/libstdc++.so.6) 2022-05-18T04:32:30.0371656Z frame #35: + 0x76ba (0x7f746e4f36ba in /lib/x86_64-linux-gnu/libpthread.so.0) 2022-05-18T04:32:30.0372164Z frame #36: clone + 0x6d (0x7f746e22951d in /lib/x86_64-linux-gnu/libc.so.6) 2022-05-18T04:32:30.0372386Z 2022-05-18T04:32:30.0372407Z 2022-05-18T04:32:30.0415788Z On WorkerInfo(id=2, name=worker2): 2022-05-18T04:32:30.0429523Z RuntimeError('Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!\nException raised from compute_types at /var/lib/jenkins/workspace/aten/src/ATen/TensorIterator.cpp:484 (most recent call first):\nframe #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) + 0x6b (0x7efe057d91eb in /opt/conda/lib/python3.7/site-packages/torch/lib/libc10.so)\nframe #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string, std::allocator > const&) + 0xce (0x7efe057d4bbe in /opt/conda/lib/python3.7/site-packages/torch/lib/libc10.so)\nframe #2: at::TensorIteratorBase::compute_types(at::TensorIteratorConfig const&) + 0xc2b (0x7efe0f6366db in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #3: at::TensorIteratorBase::build(at::TensorIteratorConfig&) + 0x7f (0x7efe0f638b1f in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #4: at::TensorIteratorBase::build_borrowing_binary_op(at::TensorBase const&, at::TensorBase const&, at::TensorBase const&) + 0xf7 (0x7efe0f63a2e7 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #5: at::meta::structured_add_Tensor::meta(at::Tensor const&, at::Tensor const&, c10::Scalar const&) + 0x2f (0x7efe0f807faf in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #6: + 0x2a45b96 (0x7efe0845db96 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cuda_cu.so)\nframe #7: + 0x2a45cb6 (0x7efe0845dcb6 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cuda_cu.so)\nframe #8: at::_ops::add_Tensor::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, c10::Scalar const&) + 0x98 (0x7efe100aa6c8 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #9: + 0x2b91b65 (0x7efe11447b65 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #10: + 0x2b922f9 (0x7efe114482f9 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #11: at::_ops::add_Tensor::call(at::Tensor const&, at::Tensor const&, c10::Scalar const&) + 0x173 (0x7efe100d5d33 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #12: + 0x2c1ec7 (0x7efe1c182ec7 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)\nframe #13: + 0x2c2206 (0x7efe1c183206 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)\nframe #14: _PyMethodDef_RawFastCallDict + 0x264 (0x5604535763f4 in /opt/conda/bin/python)\nframe #15: _PyObject_FastCallDict + 0x6e (0x5604535472ee in /opt/conda/bin/python)\nframe #16: + 0x135eb0 (0x560453562eb0 in /opt/conda/bin/python)\nframe #17: + 0x1f5a6f (0x560453622a6f in /opt/conda/bin/python)\nframe #18: PyNumber_Add + 0x41 (0x5604535810d1 in /opt/conda/bin/python)\nframe #19: _PyEval_EvalFrameDefault + 0xfba (0x5604535f0f5a in /opt/conda/bin/python)\nframe #20: _PyFunction_FastCallDict + 0x118 (0x560453564cf8 in /opt/conda/bin/python)\nframe #21: _PyEval_EvalFrameDefault + 0x1cb8 (0x5604535f1c58 in /opt/conda/bin/python)\nframe #22: _PyFunction_FastCallDict + 0x118 (0x560453564cf8 in /opt/conda/bin/python)\nframe #23: + 0x9839ef (0x7efe1c8449ef in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)\nframe #24: torch::distributed::rpc::PythonRpcHandler::runPythonUdf(pybind11::object const&) + 0x7d (0x7efe1c84339d in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)\nframe #25: torch::distributed::rpc::RequestCallbackImpl::runPythonFunction(pybind11::object const&, std::vector >, bool) const + 0x83 (0x7efe1c845d83 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)\nframe #26: torch::distributed::rpc::RequestCallbackImpl::processPythonCall(torch::distributed::rpc::RpcCommandBase&, std::vector >) const + 0x96 (0x7efe1c849dc6 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)\nframe #27: torch::distributed::rpc::RequestCallbackNoPython::processRpc(torch::distributed::rpc::RpcCommandBase&, torch::distributed::rpc::MessageType const&, std::vector >) const + 0x10c (0x7efe12572a9c in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #28: torch::distributed::rpc::RequestCallbackImpl::processRpcWithErrors(torch::distributed::rpc::RpcCommandBase&, torch::distributed::rpc::MessageType const&, std::vector >) const + 0x65 (0x7efe1c845b75 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so)\nframe #29: + 0x3cb5e23 (0x7efe1256be23 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #30: torch::distributed::rpc::RequestCallbackNoPython::processMessage(torch::distributed::rpc::Message&, std::vector >) const + 0x538 (0x7efe1256ca18 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #31: torch::distributed::rpc::RequestCallback::operator()(torch::distributed::rpc::Message&, std::vector >) const + 0x57 (0x7efe12567097 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #32: + 0x3ce5b22 (0x7efe1259bb22 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)\nframe #33: c10::ThreadPool::main_loop(unsigned long) + 0x2db (0x7efe057c54bb in /opt/conda/lib/python3.7/site-packages/torch/lib/libc10.so)\nframe #34: + 0xc9039 (0x7efe299cc039 in /opt/conda/lib/libstdc++.so.6)\nframe #35: + 0x76ba (0x7efe4a2fb6ba in /lib/x86_64-linux-gnu/libpthread.so.0)\nframe #36: clone + 0x6d (0x7efe4a03151d in /lib/x86_64-linux-gnu/libc.so.6)\n') 2022-05-18T04:32:30.0437003Z Traceback (most recent call last): 2022-05-18T04:32:30.0437543Z File "/opt/conda/lib/python3.7/site-packages/torch/distributed/rpc/internal.py", line 206, in _run_function 2022-05-18T04:32:30.0438004Z result = python_udf.func(*python_udf.args, **python_udf.kwargs) 2022-05-18T04:32:30.0438727Z File "/opt/conda/lib/python3.7/site-packages/torch/testing/_internal/distributed/rpc/rpc_test.py", line 6267, in _gpu_add_wrong_gpus 2022-05-18T04:32:30.0439156Z return x.cpu() + y.cuda() 2022-05-18T04:32:30.0439543Z RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! 2022-05-18T04:32:30.0440087Z Exception raised from compute_types at /var/lib/jenkins/workspace/aten/src/ATen/TensorIterator.cpp:484 (most recent call first): 2022-05-18T04:32:30.0440940Z frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string, std::allocator >) + 0x6b (0x7efe057d91eb in /opt/conda/lib/python3.7/site-packages/torch/lib/libc10.so) 2022-05-18T04:32:30.0441924Z frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string, std::allocator > const&) + 0xce (0x7efe057d4bbe in /opt/conda/lib/python3.7/site-packages/torch/lib/libc10.so) 2022-05-18T04:32:30.0442821Z frame #2: at::TensorIteratorBase::compute_types(at::TensorIteratorConfig const&) + 0xc2b (0x7efe0f6366db in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:32:30.0443611Z frame #3: at::TensorIteratorBase::build(at::TensorIteratorConfig&) + 0x7f (0x7efe0f638b1f in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:32:30.0444503Z frame #4: at::TensorIteratorBase::build_borrowing_binary_op(at::TensorBase const&, at::TensorBase const&, at::TensorBase const&) + 0xf7 (0x7efe0f63a2e7 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:32:30.0445402Z frame #5: at::meta::structured_add_Tensor::meta(at::Tensor const&, at::Tensor const&, c10::Scalar const&) + 0x2f (0x7efe0f807faf in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:32:30.0446134Z frame #6: + 0x2a45b96 (0x7efe0845db96 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cuda_cu.so) 2022-05-18T04:32:30.0446776Z frame #7: + 0x2a45cb6 (0x7efe0845dcb6 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cuda_cu.so) 2022-05-18T04:32:30.0447602Z frame #8: at::_ops::add_Tensor::redispatch(c10::DispatchKeySet, at::Tensor const&, at::Tensor const&, c10::Scalar const&) + 0x98 (0x7efe100aa6c8 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:32:30.0448343Z frame #9: + 0x2b91b65 (0x7efe11447b65 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:32:30.0448982Z frame #10: + 0x2b922f9 (0x7efe114482f9 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:32:30.0449750Z frame #11: at::_ops::add_Tensor::call(at::Tensor const&, at::Tensor const&, c10::Scalar const&) + 0x173 (0x7efe100d5d33 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:32:30.0450904Z frame #12: + 0x2c1ec7 (0x7efe1c182ec7 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so) 2022-05-18T04:32:30.0451657Z frame #13: + 0x2c2206 (0x7efe1c183206 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so) 2022-05-18T04:32:30.0452144Z frame #14: _PyMethodDef_RawFastCallDict + 0x264 (0x5604535763f4 in /opt/conda/bin/python) 2022-05-18T04:32:30.0452553Z frame #15: _PyObject_FastCallDict + 0x6e (0x5604535472ee in /opt/conda/bin/python) 2022-05-18T04:32:30.0452962Z frame #16: + 0x135eb0 (0x560453562eb0 in /opt/conda/bin/python) 2022-05-18T04:32:30.0453361Z frame #17: + 0x1f5a6f (0x560453622a6f in /opt/conda/bin/python) 2022-05-18T04:32:30.0453862Z frame #18: PyNumber_Add + 0x41 (0x5604535810d1 in /opt/conda/bin/python) 2022-05-18T04:32:30.0454254Z frame #19: _PyEval_EvalFrameDefault + 0xfba (0x5604535f0f5a in /opt/conda/bin/python) 2022-05-18T04:32:30.0454679Z frame #20: _PyFunction_FastCallDict + 0x118 (0x560453564cf8 in /opt/conda/bin/python) 2022-05-18T04:32:30.0455160Z frame #21: _PyEval_EvalFrameDefault + 0x1cb8 (0x5604535f1c58 in /opt/conda/bin/python) 2022-05-18T04:32:30.0455591Z frame #22: _PyFunction_FastCallDict + 0x118 (0x560453564cf8 in /opt/conda/bin/python) 2022-05-18T04:32:30.0456191Z frame #23: + 0x9839ef (0x7efe1c8449ef in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so) 2022-05-18T04:32:30.0456984Z frame #24: torch::distributed::rpc::PythonRpcHandler::runPythonUdf(pybind11::object const&) + 0x7d (0x7efe1c84339d in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so) 2022-05-18T04:32:30.0457989Z frame #25: torch::distributed::rpc::RequestCallbackImpl::runPythonFunction(pybind11::object const&, std::vector >, bool) const + 0x83 (0x7efe1c845d83 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so) 2022-05-18T04:32:30.0459112Z frame #26: torch::distributed::rpc::RequestCallbackImpl::processPythonCall(torch::distributed::rpc::RpcCommandBase&, std::vector >) const + 0x96 (0x7efe1c849dc6 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so) 2022-05-18T04:32:30.0460320Z frame #27: torch::distributed::rpc::RequestCallbackNoPython::processRpc(torch::distributed::rpc::RpcCommandBase&, torch::distributed::rpc::MessageType const&, std::vector >) const + 0x10c (0x7efe12572a9c in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:32:30.0461578Z frame #28: torch::distributed::rpc::RequestCallbackImpl::processRpcWithErrors(torch::distributed::rpc::RpcCommandBase&, torch::distributed::rpc::MessageType const&, std::vector >) const + 0x65 (0x7efe1c845b75 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_python.so) 2022-05-18T04:32:30.0462475Z frame #29: + 0x3cb5e23 (0x7efe1256be23 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:32:30.0463414Z frame #30: torch::distributed::rpc::RequestCallbackNoPython::processMessage(torch::distributed::rpc::Message&, std::vector >) const + 0x538 (0x7efe1256ca18 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:32:30.0464474Z frame #31: torch::distributed::rpc::RequestCallback::operator()(torch::distributed::rpc::Message&, std::vector >) const + 0x57 (0x7efe12567097 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:32:30.0465256Z frame #32: + 0x3ce5b22 (0x7efe1259bb22 in /opt/conda/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so) 2022-05-18T04:32:30.0465930Z frame #33: c10::ThreadPool::main_loop(unsigned long) + 0x2db (0x7efe057c54bb in /opt/conda/lib/python3.7/site-packages/torch/lib/libc10.so) 2022-05-18T04:32:30.0466428Z frame #34: + 0xc9039 (0x7efe299cc039 in /opt/conda/lib/libstdc++.so.6) 2022-05-18T04:32:30.0466988Z frame #35: + 0x76ba (0x7efe4a2fb6ba in /lib/x86_64-linux-gnu/libpthread.so.0) 2022-05-18T04:32:30.0467502Z frame #36: clone + 0x6d (0x7efe4a03151d in /lib/x86_64-linux-gnu/libc.so.6) 2022-05-18T04:32:30.0467730Z 2022-05-18T04:32:30.0467752Z 2022-05-18T04:32:30.4505228Z ok (6.476s) 2022-05-18T04:32:30.4505435Z 2022-05-18T04:32:30.4505874Z ---------------------------------------------------------------------- 2022-05-18T04:32:30.4506221Z Ran 1 test in 6.476s 2022-05-18T04:32:30.4506388Z 2022-05-18T04:32:30.4506486Z OK 2022-05-18T04:32:30.4506625Z 2022-05-18T04:32:30.4506770Z Generating XML reports... 2022-05-18T04:32:30.4549582Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518043223.xml 2022-05-18T04:32:31.6035406Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpsyi6vr01 2022-05-18T04:32:31.6036541Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpsyi6vr01/_remote_module_non_scriptable.py 2022-05-18T04:32:31.9634335Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:32:31.9648625Z 2022-05-18T04:32:31.9649067Z Running tests... 2022-05-18T04:32:31.9649514Z ---------------------------------------------------------------------- 2022-05-18T04:32:33.5847826Z test_devices_option_mismatch (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:32:33.6520553Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 23823 2022-05-18T04:32:33.6626848Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 23824 2022-05-18T04:32:33.6737035Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 23825 2022-05-18T04:32:33.6844620Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 23826 2022-05-18T04:32:34.5803006Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpwg7knj3t 2022-05-18T04:32:34.5804298Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpwg7knj3t/_remote_module_non_scriptable.py 2022-05-18T04:32:34.5936066Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp4vca_dns 2022-05-18T04:32:34.5938725Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp4vca_dns/_remote_module_non_scriptable.py 2022-05-18T04:32:34.6176556Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp_20hqjf9 2022-05-18T04:32:34.6178832Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp_20hqjf9/_remote_module_non_scriptable.py 2022-05-18T04:32:34.6433236Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpoj4qg3k3 2022-05-18T04:32:34.6436097Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpoj4qg3k3/_remote_module_non_scriptable.py 2022-05-18T04:32:34.9498311Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:32:34.9520607Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:32:34.9841337Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:32:35.0041579Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:32:35.2902694Z ok (3.325s) 2022-05-18T04:32:35.2902936Z 2022-05-18T04:32:35.2903334Z ---------------------------------------------------------------------- 2022-05-18T04:32:35.2903680Z Ran 1 test in 3.325s 2022-05-18T04:32:35.2903829Z 2022-05-18T04:32:35.2903942Z OK 2022-05-18T04:32:35.2904077Z 2022-05-18T04:32:35.2904209Z Generating XML reports... 2022-05-18T04:32:35.2947340Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518043231.xml 2022-05-18T04:32:36.4545974Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpq23ev3yp 2022-05-18T04:32:36.4547320Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpq23ev3yp/_remote_module_non_scriptable.py 2022-05-18T04:32:36.8153964Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:32:36.8168560Z 2022-05-18T04:32:36.8168784Z Running tests... 2022-05-18T04:32:36.8169220Z ---------------------------------------------------------------------- 2022-05-18T04:32:38.4253414Z test_devices_option_mismatch_reverse (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:32:38.4891058Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 24016 2022-05-18T04:32:38.4998916Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 24017 2022-05-18T04:32:38.5106894Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 24018 2022-05-18T04:32:38.5213685Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 24019 2022-05-18T04:32:39.4007445Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpup2qdr4n 2022-05-18T04:32:39.4008665Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpup2qdr4n/_remote_module_non_scriptable.py 2022-05-18T04:32:39.4180798Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpsieepdie 2022-05-18T04:32:39.4183106Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpsieepdie/_remote_module_non_scriptable.py 2022-05-18T04:32:39.4236587Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp4q0b0y2i 2022-05-18T04:32:39.4238801Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp4q0b0y2i/_remote_module_non_scriptable.py 2022-05-18T04:32:39.4354913Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpbafagr__ 2022-05-18T04:32:39.4356905Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpbafagr__/_remote_module_non_scriptable.py 2022-05-18T04:32:39.7701241Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:32:39.7716656Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:32:39.7724556Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:32:39.7926050Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:32:40.0268596Z ok (3.210s) 2022-05-18T04:32:40.0268822Z 2022-05-18T04:32:40.0269214Z ---------------------------------------------------------------------- 2022-05-18T04:32:40.0269591Z Ran 1 test in 3.210s 2022-05-18T04:32:40.0269740Z 2022-05-18T04:32:40.0269838Z OK 2022-05-18T04:32:40.0269978Z 2022-05-18T04:32:40.0270112Z Generating XML reports... 2022-05-18T04:32:40.0313194Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518043236.xml 2022-05-18T04:32:41.1817051Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpwowhkync 2022-05-18T04:32:41.1818027Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpwowhkync/_remote_module_non_scriptable.py 2022-05-18T04:32:41.5395906Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:32:41.5410095Z 2022-05-18T04:32:41.5410361Z Running tests... 2022-05-18T04:32:41.5410802Z ---------------------------------------------------------------------- 2022-05-18T04:32:43.1558414Z test_meta_multiple_tensors (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:32:43.2180781Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 24209 2022-05-18T04:32:43.2285232Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 24210 2022-05-18T04:32:43.2393394Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 24211 2022-05-18T04:32:43.2501330Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 24212 2022-05-18T04:32:44.2049555Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpwnduajlt 2022-05-18T04:32:44.2050704Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpwnduajlt/_remote_module_non_scriptable.py 2022-05-18T04:32:44.2220735Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp2mu1i1_9 2022-05-18T04:32:44.2223499Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp2mu1i1_9/_remote_module_non_scriptable.py 2022-05-18T04:32:44.2829546Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp8ss3ault 2022-05-18T04:32:44.2831633Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp8ss3ault/_remote_module_non_scriptable.py 2022-05-18T04:32:44.2843952Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpn5quaqgn 2022-05-18T04:32:44.2847232Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpn5quaqgn/_remote_module_non_scriptable.py 2022-05-18T04:32:44.5681881Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:32:44.5765613Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:32:44.6404170Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:32:44.6518916Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:32:48.0638761Z ok (6.522s) 2022-05-18T04:32:48.0639175Z 2022-05-18T04:32:48.0639607Z ---------------------------------------------------------------------- 2022-05-18T04:32:48.0639968Z Ran 1 test in 6.523s 2022-05-18T04:32:48.0640119Z 2022-05-18T04:32:48.0640218Z OK 2022-05-18T04:32:48.0640356Z 2022-05-18T04:32:48.0640493Z Generating XML reports... 2022-05-18T04:32:48.0684123Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518043241.xml 2022-05-18T04:32:49.2348946Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpbml_g8v3 2022-05-18T04:32:49.2351144Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpbml_g8v3/_remote_module_non_scriptable.py 2022-05-18T04:32:49.6037602Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:32:49.6052454Z 2022-05-18T04:32:49.6052728Z Running tests... 2022-05-18T04:32:49.6053166Z ---------------------------------------------------------------------- 2022-05-18T04:32:51.2316967Z test_owner_rref_forward_synchronization1 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:32:51.2961887Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 24562 2022-05-18T04:32:51.3068057Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 24563 2022-05-18T04:32:51.3177775Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 24564 2022-05-18T04:32:51.3285910Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 24565 2022-05-18T04:32:52.2842932Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmprwj11vsp 2022-05-18T04:32:52.2843829Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmprwj11vsp/_remote_module_non_scriptable.py 2022-05-18T04:32:52.2977599Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp9kt42bx5 2022-05-18T04:32:52.2980672Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp9kt42bx5/_remote_module_non_scriptable.py 2022-05-18T04:32:52.3059283Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp9552b_m7 2022-05-18T04:32:52.3062159Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp9552b_m7/_remote_module_non_scriptable.py 2022-05-18T04:32:52.3469741Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpta4785yg 2022-05-18T04:32:52.3472103Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpta4785yg/_remote_module_non_scriptable.py 2022-05-18T04:32:52.6395046Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:32:52.6536361Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:32:52.6589737Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:32:52.7019937Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:32:57.6442271Z ok (8.039s) 2022-05-18T04:32:57.6442524Z 2022-05-18T04:32:57.6443101Z ---------------------------------------------------------------------- 2022-05-18T04:32:57.6443436Z Ran 1 test in 8.039s 2022-05-18T04:32:57.6443602Z 2022-05-18T04:32:57.6443696Z OK 2022-05-18T04:32:57.6443832Z 2022-05-18T04:32:57.6443967Z Generating XML reports... 2022-05-18T04:32:57.6486481Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518043249.xml 2022-05-18T04:32:58.8338441Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpg4zm8ny8 2022-05-18T04:32:58.8342925Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpg4zm8ny8/_remote_module_non_scriptable.py 2022-05-18T04:32:59.2076633Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:32:59.2091878Z 2022-05-18T04:32:59.2092322Z Running tests... 2022-05-18T04:32:59.2092764Z ---------------------------------------------------------------------- 2022-05-18T04:33:00.8539260Z test_owner_rref_forward_synchronization2 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:33:00.9190977Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 24786 2022-05-18T04:33:00.9295502Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 24787 2022-05-18T04:33:00.9403645Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 24788 2022-05-18T04:33:00.9513071Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 24789 2022-05-18T04:33:01.8224813Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmps5ig4vou 2022-05-18T04:33:01.8225825Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmps5ig4vou/_remote_module_non_scriptable.py 2022-05-18T04:33:01.8262687Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpp3h6ztqt 2022-05-18T04:33:01.8265550Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpp3h6ztqt/_remote_module_non_scriptable.py 2022-05-18T04:33:01.8310818Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpb3obypxw 2022-05-18T04:33:01.8313446Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpb3obypxw/_remote_module_non_scriptable.py 2022-05-18T04:33:01.8866089Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpucfej3tw 2022-05-18T04:33:01.8868174Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpucfej3tw/_remote_module_non_scriptable.py 2022-05-18T04:33:02.1832518Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:33:02.1864198Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:33:02.1928060Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:33:02.2436490Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:33:08.7705601Z ok (9.561s) 2022-05-18T04:33:08.7705929Z 2022-05-18T04:33:08.7706477Z ---------------------------------------------------------------------- 2022-05-18T04:33:08.7706820Z Ran 1 test in 9.561s 2022-05-18T04:33:08.7706993Z 2022-05-18T04:33:08.7707109Z OK 2022-05-18T04:33:08.7707249Z 2022-05-18T04:33:08.7707392Z Generating XML reports... 2022-05-18T04:33:08.7753191Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518043259.xml 2022-05-18T04:33:09.9598929Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpbs8jp1mj 2022-05-18T04:33:09.9599998Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpbs8jp1mj/_remote_module_non_scriptable.py 2022-05-18T04:33:10.3354795Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:33:10.3370334Z 2022-05-18T04:33:10.3370711Z Running tests... 2022-05-18T04:33:10.3371373Z ---------------------------------------------------------------------- 2022-05-18T04:33:11.9818813Z test_owner_rref_forward_synchronization3 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:33:12.0467833Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 25011 2022-05-18T04:33:12.0573384Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 25012 2022-05-18T04:33:12.0682222Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 25013 2022-05-18T04:33:12.0790786Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 25014 2022-05-18T04:33:12.9370340Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp3en8c3ij 2022-05-18T04:33:12.9371196Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp3en8c3ij/_remote_module_non_scriptable.py 2022-05-18T04:33:12.9649783Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpgm7d3esh 2022-05-18T04:33:12.9652257Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpgm7d3esh/_remote_module_non_scriptable.py 2022-05-18T04:33:12.9747954Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpjf9euyvz 2022-05-18T04:33:12.9750845Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpjf9euyvz/_remote_module_non_scriptable.py 2022-05-18T04:33:13.0072046Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpdj6tn8r_ 2022-05-18T04:33:13.0074669Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpdj6tn8r_/_remote_module_non_scriptable.py 2022-05-18T04:33:13.2948874Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:33:13.3205351Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:33:13.3318942Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:33:13.3751466Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:33:19.5970392Z ok (9.260s) 2022-05-18T04:33:19.5970612Z 2022-05-18T04:33:19.5971037Z ---------------------------------------------------------------------- 2022-05-18T04:33:19.5971623Z Ran 1 test in 9.260s 2022-05-18T04:33:19.5971795Z 2022-05-18T04:33:19.5971894Z OK 2022-05-18T04:33:19.5972036Z 2022-05-18T04:33:19.5972171Z Generating XML reports... 2022-05-18T04:33:19.6015145Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518043310.xml 2022-05-18T04:33:20.7777104Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpr_0m2e57 2022-05-18T04:33:20.7778082Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpr_0m2e57/_remote_module_non_scriptable.py 2022-05-18T04:33:21.1534841Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:33:21.1550847Z 2022-05-18T04:33:21.1550988Z Running tests... 2022-05-18T04:33:21.1551805Z ---------------------------------------------------------------------- 2022-05-18T04:33:22.8150252Z test_owner_rref_forward_synchronization4 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:33:22.8797832Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 25236 2022-05-18T04:33:22.8907258Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 25237 2022-05-18T04:33:22.9018014Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 25238 2022-05-18T04:33:22.9127878Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 25239 2022-05-18T04:33:23.7813702Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp72b5re3z 2022-05-18T04:33:23.7815007Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp72b5re3z/_remote_module_non_scriptable.py 2022-05-18T04:33:23.7921854Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpypdkjslh 2022-05-18T04:33:23.7925007Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpypdkjslh/_remote_module_non_scriptable.py 2022-05-18T04:33:23.7925602Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpf5phzp41 2022-05-18T04:33:23.7928690Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpf5phzp41/_remote_module_non_scriptable.py 2022-05-18T04:33:23.8341643Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmphr8lkpzr 2022-05-18T04:33:23.8344161Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmphr8lkpzr/_remote_module_non_scriptable.py 2022-05-18T04:33:24.1351086Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:33:24.1497624Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:33:24.1597665Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:33:24.1918331Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:33:29.1279629Z ok (7.972s) 2022-05-18T04:33:29.1279845Z 2022-05-18T04:33:29.1280245Z ---------------------------------------------------------------------- 2022-05-18T04:33:29.1280599Z Ran 1 test in 7.973s 2022-05-18T04:33:29.1281940Z 2022-05-18T04:33:29.1282218Z OK 2022-05-18T04:33:29.1282415Z 2022-05-18T04:33:29.1282569Z Generating XML reports... 2022-05-18T04:33:29.1325018Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518043321.xml 2022-05-18T04:33:30.3033738Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpe0huayzg 2022-05-18T04:33:30.3034750Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpe0huayzg/_remote_module_non_scriptable.py 2022-05-18T04:33:30.6748628Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:33:30.6764553Z 2022-05-18T04:33:30.6764939Z Running tests... 2022-05-18T04:33:30.6765458Z ---------------------------------------------------------------------- 2022-05-18T04:33:32.3240442Z test_rref_as_arg_synchronization1 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:33:32.3885500Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 25460 2022-05-18T04:33:32.3990500Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 25461 2022-05-18T04:33:32.4100356Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 25462 2022-05-18T04:33:32.4209196Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 25463 2022-05-18T04:33:33.3217999Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpqh70ndae 2022-05-18T04:33:33.3219435Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpqh70ndae/_remote_module_non_scriptable.py 2022-05-18T04:33:33.3720079Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmplx9t9wyx 2022-05-18T04:33:33.3722674Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmplx9t9wyx/_remote_module_non_scriptable.py 2022-05-18T04:33:33.3736498Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpsze83fnn 2022-05-18T04:33:33.3739286Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpsze83fnn/_remote_module_non_scriptable.py 2022-05-18T04:33:33.3923560Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpmy9zrwt0 2022-05-18T04:33:33.3926230Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpmy9zrwt0/_remote_module_non_scriptable.py 2022-05-18T04:33:33.6981234Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:33:33.7402162Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:33:33.7442260Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:33:33.7509542Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:33:47.2598189Z ok (16.583s) 2022-05-18T04:33:47.2598426Z 2022-05-18T04:33:47.2598824Z ---------------------------------------------------------------------- 2022-05-18T04:33:47.2599182Z Ran 1 test in 16.583s 2022-05-18T04:33:47.2599354Z 2022-05-18T04:33:47.2599459Z OK 2022-05-18T04:33:47.2599601Z 2022-05-18T04:33:47.2599741Z Generating XML reports... 2022-05-18T04:33:47.2643831Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518043330.xml 2022-05-18T04:33:48.4296405Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpk95zal44 2022-05-18T04:33:48.4297794Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpk95zal44/_remote_module_non_scriptable.py 2022-05-18T04:33:48.8010070Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:33:48.8025697Z 2022-05-18T04:33:48.8025850Z Running tests... 2022-05-18T04:33:48.8026671Z ---------------------------------------------------------------------- 2022-05-18T04:33:50.4345248Z test_rref_as_arg_synchronization2 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:33:50.4999699Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 25813 2022-05-18T04:33:50.5105648Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 25814 2022-05-18T04:33:50.5214354Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 25815 2022-05-18T04:33:50.5322431Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 25816 2022-05-18T04:33:51.4151739Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpuk59xgu4 2022-05-18T04:33:51.4152671Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpuk59xgu4/_remote_module_non_scriptable.py 2022-05-18T04:33:51.4312884Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpajrt_1h3 2022-05-18T04:33:51.4315555Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpajrt_1h3/_remote_module_non_scriptable.py 2022-05-18T04:33:51.4783702Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpfiatz11c 2022-05-18T04:33:51.4786396Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpfiatz11c/_remote_module_non_scriptable.py 2022-05-18T04:33:51.4910472Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpwayl2uus 2022-05-18T04:33:51.4913457Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpwayl2uus/_remote_module_non_scriptable.py 2022-05-18T04:33:51.7763524Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:33:51.7976528Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:33:51.8360802Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:33:51.8430861Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:34:07.1702297Z ok (18.367s) 2022-05-18T04:34:07.1702657Z 2022-05-18T04:34:07.1703406Z ---------------------------------------------------------------------- 2022-05-18T04:34:07.1704032Z Ran 1 test in 18.368s 2022-05-18T04:34:07.1704206Z 2022-05-18T04:34:07.1704307Z OK 2022-05-18T04:34:07.1704450Z 2022-05-18T04:34:07.1704934Z Generating XML reports... 2022-05-18T04:34:07.1749609Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518043348.xml 2022-05-18T04:34:08.3406233Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpp9lwi877 2022-05-18T04:34:08.3407796Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpp9lwi877/_remote_module_non_scriptable.py 2022-05-18T04:34:08.7099570Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:34:08.7115655Z 2022-05-18T04:34:08.7115911Z Running tests... 2022-05-18T04:34:08.7116363Z ---------------------------------------------------------------------- 2022-05-18T04:34:10.3494773Z test_rref_as_arg_synchronization3 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:34:10.4172222Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 26172 2022-05-18T04:34:10.4277177Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 26173 2022-05-18T04:34:10.4386896Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 26174 2022-05-18T04:34:10.4497062Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 26175 2022-05-18T04:34:11.3625792Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpp33r5j6w 2022-05-18T04:34:11.3626650Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpp33r5j6w/_remote_module_non_scriptable.py 2022-05-18T04:34:11.3857128Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpo2dgo9j7 2022-05-18T04:34:11.3860170Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpo2dgo9j7/_remote_module_non_scriptable.py 2022-05-18T04:34:11.3861324Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpy39_jq9g 2022-05-18T04:34:11.3864159Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpy39_jq9g/_remote_module_non_scriptable.py 2022-05-18T04:34:11.3915448Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp8zgqd_1d 2022-05-18T04:34:11.3918143Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp8zgqd_1d/_remote_module_non_scriptable.py 2022-05-18T04:34:11.7206522Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:34:11.7442395Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:34:11.7445202Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:34:11.7565475Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:34:25.2841845Z ok (16.572s) 2022-05-18T04:34:25.2842300Z 2022-05-18T04:34:25.2842896Z ---------------------------------------------------------------------- 2022-05-18T04:34:25.2843276Z Ran 1 test in 16.573s 2022-05-18T04:34:25.2843476Z 2022-05-18T04:34:25.2843580Z OK 2022-05-18T04:34:25.2843722Z 2022-05-18T04:34:25.2843865Z Generating XML reports... 2022-05-18T04:34:25.2889356Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518043408.xml 2022-05-18T04:34:26.4417111Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpvrl93uez 2022-05-18T04:34:26.4418174Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpvrl93uez/_remote_module_non_scriptable.py 2022-05-18T04:34:26.8002031Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:34:26.8017160Z 2022-05-18T04:34:26.8017561Z Running tests... 2022-05-18T04:34:26.8018009Z ---------------------------------------------------------------------- 2022-05-18T04:34:28.4080443Z test_rref_as_arg_synchronization4 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:34:28.4753643Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 26525 2022-05-18T04:34:28.4860029Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 26526 2022-05-18T04:34:28.4967676Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 26527 2022-05-18T04:34:28.5078790Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 26528 2022-05-18T04:34:29.4634128Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpdflc_2sy 2022-05-18T04:34:29.4634749Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpdflc_2sy/_remote_module_non_scriptable.py 2022-05-18T04:34:29.4735716Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmplvvxkxs9 2022-05-18T04:34:29.4738603Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmplvvxkxs9/_remote_module_non_scriptable.py 2022-05-18T04:34:29.4800344Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpujk6l2kv 2022-05-18T04:34:29.4803077Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpujk6l2kv/_remote_module_non_scriptable.py 2022-05-18T04:34:29.5629155Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpmgqaj6m9 2022-05-18T04:34:29.5630815Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpmgqaj6m9/_remote_module_non_scriptable.py 2022-05-18T04:34:29.8273801Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:34:29.8294125Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:34:29.8394794Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:34:29.9144073Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:34:45.4471371Z ok (18.645s) 2022-05-18T04:34:45.4471602Z 2022-05-18T04:34:45.4473692Z ---------------------------------------------------------------------- 2022-05-18T04:34:45.4474118Z Ran 1 test in 18.645s 2022-05-18T04:34:45.4474295Z 2022-05-18T04:34:45.4474395Z OK 2022-05-18T04:34:45.4474514Z 2022-05-18T04:34:45.4474653Z Generating XML reports... 2022-05-18T04:34:45.4516659Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518043426.xml 2022-05-18T04:34:46.6170861Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpncpga2sn 2022-05-18T04:34:46.6171946Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpncpga2sn/_remote_module_non_scriptable.py 2022-05-18T04:34:46.9887454Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:34:46.9902895Z 2022-05-18T04:34:46.9903224Z Running tests... 2022-05-18T04:34:46.9903726Z ---------------------------------------------------------------------- 2022-05-18T04:34:48.6584328Z test_rref_as_arg_synchronization5 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:34:48.7229595Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 26884 2022-05-18T04:34:48.7334061Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 26885 2022-05-18T04:34:48.7442819Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 26886 2022-05-18T04:34:48.7555328Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 26887 2022-05-18T04:34:49.6678007Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp7sib5v1j 2022-05-18T04:34:49.6679307Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp7sib5v1j/_remote_module_non_scriptable.py 2022-05-18T04:34:49.6826494Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp8odpewys 2022-05-18T04:34:49.6829287Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp8odpewys/_remote_module_non_scriptable.py 2022-05-18T04:34:49.7002533Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpd767ga6b 2022-05-18T04:34:49.7005404Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpd767ga6b/_remote_module_non_scriptable.py 2022-05-18T04:34:49.7040807Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp4ogktawx 2022-05-18T04:34:49.7043827Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp4ogktawx/_remote_module_non_scriptable.py 2022-05-18T04:34:50.0252818Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:34:50.0449046Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:34:50.0525514Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:34:50.0751136Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:35:03.3887858Z ok (16.398s) 2022-05-18T04:35:03.3891640Z 2022-05-18T04:35:03.3892188Z ---------------------------------------------------------------------- 2022-05-18T04:35:03.3892551Z Ran 1 test in 16.398s 2022-05-18T04:35:03.3892726Z 2022-05-18T04:35:03.3892824Z OK 2022-05-18T04:35:03.3892945Z 2022-05-18T04:35:03.3895614Z Generating XML reports... 2022-05-18T04:35:03.3938447Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518043446.xml 2022-05-18T04:35:04.5756769Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpze5pdv3u 2022-05-18T04:35:04.5758162Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpze5pdv3u/_remote_module_non_scriptable.py 2022-05-18T04:35:04.9470203Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:35:04.9485325Z 2022-05-18T04:35:04.9485701Z Running tests... 2022-05-18T04:35:04.9486191Z ---------------------------------------------------------------------- 2022-05-18T04:35:06.6071679Z test_rref_forward_synchronization1 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:35:06.6707600Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 27237 2022-05-18T04:35:06.6813073Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 27238 2022-05-18T04:35:06.6920595Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 27239 2022-05-18T04:35:06.7029395Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 27240 2022-05-18T04:35:07.5769746Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpnu8m_zx5 2022-05-18T04:35:07.5770616Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpnu8m_zx5/_remote_module_non_scriptable.py 2022-05-18T04:35:07.5859953Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp0nkyz1dx 2022-05-18T04:35:07.5862982Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp0nkyz1dx/_remote_module_non_scriptable.py 2022-05-18T04:35:07.6429071Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpktjbb1ib 2022-05-18T04:35:07.6431774Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpktjbb1ib/_remote_module_non_scriptable.py 2022-05-18T04:35:07.6456459Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpi2x0sjhy 2022-05-18T04:35:07.6459340Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpi2x0sjhy/_remote_module_non_scriptable.py 2022-05-18T04:35:07.9342701Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:35:07.9538090Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:35:07.9964403Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:35:07.9995306Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:35:19.8330122Z ok (14.884s) 2022-05-18T04:35:19.8330395Z 2022-05-18T04:35:19.8331166Z ---------------------------------------------------------------------- 2022-05-18T04:35:19.8331507Z Ran 1 test in 14.884s 2022-05-18T04:35:19.8331679Z 2022-05-18T04:35:19.8331781Z OK 2022-05-18T04:35:19.8331921Z 2022-05-18T04:35:19.8332373Z Generating XML reports... 2022-05-18T04:35:19.8375242Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518043504.xml 2022-05-18T04:35:20.9924755Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmptem7rb7p 2022-05-18T04:35:20.9925598Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmptem7rb7p/_remote_module_non_scriptable.py 2022-05-18T04:35:21.3517170Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:35:21.3531923Z 2022-05-18T04:35:21.3532159Z Running tests... 2022-05-18T04:35:21.3532610Z ---------------------------------------------------------------------- 2022-05-18T04:35:22.9654571Z test_rref_forward_synchronization2 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:35:23.0299271Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 27589 2022-05-18T04:35:23.0402578Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 27590 2022-05-18T04:35:23.0510839Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 27591 2022-05-18T04:35:23.0617656Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 27592 2022-05-18T04:35:24.0045506Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp4uzo9wth 2022-05-18T04:35:24.0046141Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp4uzo9wth/_remote_module_non_scriptable.py 2022-05-18T04:35:24.0333697Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp4kpgu0qm 2022-05-18T04:35:24.0336662Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp4kpgu0qm/_remote_module_non_scriptable.py 2022-05-18T04:35:24.0388716Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpv6w7whoa 2022-05-18T04:35:24.0391330Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpv6w7whoa/_remote_module_non_scriptable.py 2022-05-18T04:35:24.0415829Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp0qhbycia 2022-05-18T04:35:24.0418620Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp0qhbycia/_remote_module_non_scriptable.py 2022-05-18T04:35:24.3634395Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:35:24.3915415Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:35:24.3949204Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:35:24.3992410Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:35:36.7930215Z ok (15.439s) 2022-05-18T04:35:36.7930433Z 2022-05-18T04:35:36.7930850Z ---------------------------------------------------------------------- 2022-05-18T04:35:36.7931450Z Ran 1 test in 15.440s 2022-05-18T04:35:36.7931620Z 2022-05-18T04:35:36.7931718Z OK 2022-05-18T04:35:36.7931860Z 2022-05-18T04:35:36.7931978Z Generating XML reports... 2022-05-18T04:35:36.7974879Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518043521.xml 2022-05-18T04:35:37.9706069Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp4h1yxuog 2022-05-18T04:35:37.9707271Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp4h1yxuog/_remote_module_non_scriptable.py 2022-05-18T04:35:38.3459923Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:35:38.3475583Z 2022-05-18T04:35:38.3475818Z Running tests... 2022-05-18T04:35:38.3476265Z ---------------------------------------------------------------------- 2022-05-18T04:35:40.0188805Z test_rref_forward_synchronization3 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:35:40.0836052Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 27944 2022-05-18T04:35:40.0941620Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 27945 2022-05-18T04:35:40.1049355Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 27946 2022-05-18T04:35:40.1160708Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 27947 2022-05-18T04:35:40.9994150Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp7y7js3me 2022-05-18T04:35:40.9995568Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp7y7js3me/_remote_module_non_scriptable.py 2022-05-18T04:35:41.0325214Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpw_ril78q 2022-05-18T04:35:41.0327653Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpw_ril78q/_remote_module_non_scriptable.py 2022-05-18T04:35:41.0328330Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpu6v3jkjr 2022-05-18T04:35:41.0331928Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpu6v3jkjr/_remote_module_non_scriptable.py 2022-05-18T04:35:41.0470091Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpm4lk6nft 2022-05-18T04:35:41.0472687Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpm4lk6nft/_remote_module_non_scriptable.py 2022-05-18T04:35:41.3694880Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:35:41.3870986Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:35:41.3895063Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:35:41.4000732Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:35:53.5465045Z ok (15.199s) 2022-05-18T04:35:53.5465278Z 2022-05-18T04:35:53.5465846Z ---------------------------------------------------------------------- 2022-05-18T04:35:53.5466201Z Ran 1 test in 15.199s 2022-05-18T04:35:53.5466375Z 2022-05-18T04:35:53.5466472Z OK 2022-05-18T04:35:53.5466594Z 2022-05-18T04:35:53.5466736Z Generating XML reports... 2022-05-18T04:35:53.5508766Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518043538.xml 2022-05-18T04:35:54.7144575Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp7s7sq9fx 2022-05-18T04:35:54.7145582Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp7s7sq9fx/_remote_module_non_scriptable.py 2022-05-18T04:35:55.0860340Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:35:55.0875495Z 2022-05-18T04:35:55.0875948Z Running tests... 2022-05-18T04:35:55.0876465Z ---------------------------------------------------------------------- 2022-05-18T04:35:56.7369248Z test_rref_forward_synchronization4 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:35:56.8017868Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 28299 2022-05-18T04:35:56.8121915Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 28300 2022-05-18T04:35:56.8232877Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 28301 2022-05-18T04:35:56.8342337Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 28302 2022-05-18T04:35:57.6982860Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpn390v44x 2022-05-18T04:35:57.6984065Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpn390v44x/_remote_module_non_scriptable.py 2022-05-18T04:35:57.7083248Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpc6moubmm 2022-05-18T04:35:57.7085580Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpc6moubmm/_remote_module_non_scriptable.py 2022-05-18T04:35:57.7197240Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpn_k040kh 2022-05-18T04:35:57.7201226Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpn_k040kh/_remote_module_non_scriptable.py 2022-05-18T04:35:57.7207807Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpc0w9kihi 2022-05-18T04:35:57.7210216Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpc0w9kihi/_remote_module_non_scriptable.py 2022-05-18T04:35:58.0545596Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:35:58.0622789Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:35:58.0806325Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:35:58.0887020Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:36:10.2648985Z ok (15.177s) 2022-05-18T04:36:10.2649197Z 2022-05-18T04:36:10.2649610Z ---------------------------------------------------------------------- 2022-05-18T04:36:10.2649961Z Ran 1 test in 15.177s 2022-05-18T04:36:10.2650129Z 2022-05-18T04:36:10.2650226Z OK 2022-05-18T04:36:10.2650583Z 2022-05-18T04:36:10.2650703Z Generating XML reports... 2022-05-18T04:36:10.2693096Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518043555.xml 2022-05-18T04:36:11.3933186Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmprglzh9p9 2022-05-18T04:36:11.3934548Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmprglzh9p9/_remote_module_non_scriptable.py 2022-05-18T04:36:11.7620133Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:36:11.7635403Z 2022-05-18T04:36:11.7635752Z Running tests... 2022-05-18T04:36:11.7636232Z ---------------------------------------------------------------------- 2022-05-18T04:36:13.3997489Z test_rref_to_here_synchronization1 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:36:13.4631118Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 28651 2022-05-18T04:36:13.4736142Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 28652 2022-05-18T04:36:13.4842520Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 28653 2022-05-18T04:36:13.4949256Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 28654 2022-05-18T04:36:14.4188367Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpfgiszqoo 2022-05-18T04:36:14.4189599Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpfgiszqoo/_remote_module_non_scriptable.py 2022-05-18T04:36:14.4525086Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpppl3k2j9 2022-05-18T04:36:14.4526972Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpppl3k2j9/_remote_module_non_scriptable.py 2022-05-18T04:36:14.4844454Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp5a3ffin_ 2022-05-18T04:36:14.4846500Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp5a3ffin_/_remote_module_non_scriptable.py 2022-05-18T04:36:14.5096247Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp8m_iadw0 2022-05-18T04:36:14.5098512Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp8m_iadw0/_remote_module_non_scriptable.py 2022-05-18T04:36:14.7916111Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:36:14.8244322Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:36:14.8445071Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:36:14.8661268Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:36:28.3289203Z ok (16.565s) 2022-05-18T04:36:28.3289420Z 2022-05-18T04:36:28.3289835Z ---------------------------------------------------------------------- 2022-05-18T04:36:28.3290187Z Ran 1 test in 16.565s 2022-05-18T04:36:28.3290673Z 2022-05-18T04:36:28.3290776Z OK 2022-05-18T04:36:28.3290919Z 2022-05-18T04:36:28.3291040Z Generating XML reports... 2022-05-18T04:36:28.3335294Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518043611.xml 2022-05-18T04:36:29.4941946Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp2bx4pcyx 2022-05-18T04:36:29.4942777Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp2bx4pcyx/_remote_module_non_scriptable.py 2022-05-18T04:36:29.8536813Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:36:29.8551491Z 2022-05-18T04:36:29.8551744Z Running tests... 2022-05-18T04:36:29.8552181Z ---------------------------------------------------------------------- 2022-05-18T04:36:31.4673873Z test_rref_to_here_synchronization2 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:36:31.5308435Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 29004 2022-05-18T04:36:31.5414461Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 29005 2022-05-18T04:36:31.5521787Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 29006 2022-05-18T04:36:31.5629254Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 29007 2022-05-18T04:36:32.5200665Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpuvrei1gh 2022-05-18T04:36:32.5201317Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpuvrei1gh/_remote_module_non_scriptable.py 2022-05-18T04:36:32.5315086Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpr4zkylb_ 2022-05-18T04:36:32.5317546Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpr4zkylb_/_remote_module_non_scriptable.py 2022-05-18T04:36:32.5334985Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmprrqy1v6i 2022-05-18T04:36:32.5338241Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmprrqy1v6i/_remote_module_non_scriptable.py 2022-05-18T04:36:32.5817778Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp8acylk68 2022-05-18T04:36:32.5820080Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp8acylk68/_remote_module_non_scriptable.py 2022-05-18T04:36:32.8761948Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:36:32.8855125Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:36:32.9075454Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:36:32.9346271Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:36:48.2006578Z ok (18.345s) 2022-05-18T04:36:48.2006811Z 2022-05-18T04:36:48.2007201Z ---------------------------------------------------------------------- 2022-05-18T04:36:48.2007561Z Ran 1 test in 18.345s 2022-05-18T04:36:48.2007728Z 2022-05-18T04:36:48.2007827Z OK 2022-05-18T04:36:48.2008296Z 2022-05-18T04:36:48.2008437Z Generating XML reports... 2022-05-18T04:36:48.2050891Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518043629.xml 2022-05-18T04:36:49.3552043Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpfbj6z50l 2022-05-18T04:36:49.3553527Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpfbj6z50l/_remote_module_non_scriptable.py 2022-05-18T04:36:49.7138169Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:36:49.7152984Z 2022-05-18T04:36:49.7153292Z Running tests... 2022-05-18T04:36:49.7153813Z ---------------------------------------------------------------------- 2022-05-18T04:36:51.3219444Z test_rref_to_here_synchronization3 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:36:51.3849072Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 29363 2022-05-18T04:36:51.3952395Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 29364 2022-05-18T04:36:51.4059087Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 29365 2022-05-18T04:36:51.4166510Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 29366 2022-05-18T04:36:52.3645215Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpnvd6j_v5 2022-05-18T04:36:52.3646080Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpnvd6j_v5/_remote_module_non_scriptable.py 2022-05-18T04:36:52.3818828Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpbr_yxiwe 2022-05-18T04:36:52.3821854Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpbr_yxiwe/_remote_module_non_scriptable.py 2022-05-18T04:36:52.3915682Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpzdqt4cm3 2022-05-18T04:36:52.3918214Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpzdqt4cm3/_remote_module_non_scriptable.py 2022-05-18T04:36:52.4141019Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpyly6bnzq 2022-05-18T04:36:52.4143745Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpyly6bnzq/_remote_module_non_scriptable.py 2022-05-18T04:36:52.7189447Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:36:52.7547558Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:36:52.7568521Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:36:52.7696013Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:37:06.2502842Z ok (16.535s) 2022-05-18T04:37:06.2506165Z 2022-05-18T04:37:06.2506970Z ---------------------------------------------------------------------- 2022-05-18T04:37:06.2507604Z Ran 1 test in 16.535s 2022-05-18T04:37:06.2507785Z 2022-05-18T04:37:06.2507902Z OK 2022-05-18T04:37:06.2508130Z 2022-05-18T04:37:06.2508340Z Generating XML reports... 2022-05-18T04:37:06.2554320Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518043649.xml 2022-05-18T04:37:07.4450051Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp9h749lh4 2022-05-18T04:37:07.4451152Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp9h749lh4/_remote_module_non_scriptable.py 2022-05-18T04:37:07.8119413Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:37:07.8135219Z 2022-05-18T04:37:07.8135466Z Running tests... 2022-05-18T04:37:07.8135903Z ---------------------------------------------------------------------- 2022-05-18T04:37:09.4712361Z test_rref_to_here_synchronization4 (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:37:09.5357297Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 29716 2022-05-18T04:37:09.5462601Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 29717 2022-05-18T04:37:09.5572074Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 29718 2022-05-18T04:37:09.5680364Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 29719 2022-05-18T04:37:10.5171974Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp79sl8mfn 2022-05-18T04:37:10.5173325Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp79sl8mfn/_remote_module_non_scriptable.py 2022-05-18T04:37:10.5334362Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp5y08um_u 2022-05-18T04:37:10.5336990Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp5y08um_u/_remote_module_non_scriptable.py 2022-05-18T04:37:10.5434475Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmptgg_8ul0 2022-05-18T04:37:10.5437368Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmptgg_8ul0/_remote_module_non_scriptable.py 2022-05-18T04:37:10.5827372Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp0nrnf1ng 2022-05-18T04:37:10.5829900Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp0nrnf1ng/_remote_module_non_scriptable.py 2022-05-18T04:37:10.8707349Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:37:10.8870336Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:37:10.9075968Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:37:10.9406352Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:37:26.8075755Z ok (18.994s) 2022-05-18T04:37:26.8077733Z 2022-05-18T04:37:26.8078447Z ---------------------------------------------------------------------- 2022-05-18T04:37:26.8078848Z Ran 1 test in 18.994s 2022-05-18T04:37:26.8080897Z 2022-05-18T04:37:26.8081392Z OK 2022-05-18T04:37:26.8081590Z 2022-05-18T04:37:26.8081743Z Generating XML reports... 2022-05-18T04:37:26.8123326Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518043707.xml 2022-05-18T04:37:27.9827723Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmputu616nw 2022-05-18T04:37:27.9828807Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmputu616nw/_remote_module_non_scriptable.py 2022-05-18T04:37:28.3539759Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:37:28.3555086Z 2022-05-18T04:37:28.3555326Z Running tests... 2022-05-18T04:37:28.3555774Z ---------------------------------------------------------------------- 2022-05-18T04:37:30.0038984Z test_rref_with_unpickleable_attributes (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:37:30.0672566Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 30075 2022-05-18T04:37:30.0777979Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 30076 2022-05-18T04:37:30.0885826Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 30077 2022-05-18T04:37:30.0992978Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 30078 2022-05-18T04:37:30.9888987Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmph1dft9pv 2022-05-18T04:37:30.9889979Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmph1dft9pv/_remote_module_non_scriptable.py 2022-05-18T04:37:31.0155229Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp16301411 2022-05-18T04:37:31.0157842Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp16301411/_remote_module_non_scriptable.py 2022-05-18T04:37:31.0239822Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpbxax5mwh 2022-05-18T04:37:31.0242401Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpbxax5mwh/_remote_module_non_scriptable.py 2022-05-18T04:37:31.0476679Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpa91l8zdk 2022-05-18T04:37:31.0479209Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpa91l8zdk/_remote_module_non_scriptable.py 2022-05-18T04:37:31.3392601Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:37:31.3782582Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:37:31.3879542Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:37:31.3976012Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:37:34.8129774Z ok (6.457s) 2022-05-18T04:37:34.8130006Z 2022-05-18T04:37:34.8130651Z ---------------------------------------------------------------------- 2022-05-18T04:37:34.8132846Z Ran 1 test in 6.457s 2022-05-18T04:37:34.8133063Z 2022-05-18T04:37:34.8133626Z OK 2022-05-18T04:37:34.8133931Z 2022-05-18T04:37:34.8134094Z Generating XML reports... 2022-05-18T04:37:34.8176158Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518043728.xml 2022-05-18T04:37:35.9948122Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpi8gi1047 2022-05-18T04:37:35.9949581Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpi8gi1047/_remote_module_non_scriptable.py 2022-05-18T04:37:36.3669644Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:37:36.3684733Z 2022-05-18T04:37:36.3684855Z Running tests... 2022-05-18T04:37:36.3685576Z ---------------------------------------------------------------------- 2022-05-18T04:37:38.0209756Z test_tensor_view_as_return_value (__main__.TensorPipeTensorPipeAgentCudaRpcTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:37:38.0858403Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 30432 2022-05-18T04:37:38.0964390Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 30433 2022-05-18T04:37:38.1073557Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 30434 2022-05-18T04:37:38.1183790Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 30435 2022-05-18T04:37:38.9940520Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpz4d2o6bw 2022-05-18T04:37:38.9941684Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpz4d2o6bw/_remote_module_non_scriptable.py 2022-05-18T04:37:39.0740821Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpnrultrrz 2022-05-18T04:37:39.0742342Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpnrultrrz/_remote_module_non_scriptable.py 2022-05-18T04:37:39.0819022Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpetek7_1q 2022-05-18T04:37:39.0821045Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpetek7_1q/_remote_module_non_scriptable.py 2022-05-18T04:37:39.1110416Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp0z0kq5kd 2022-05-18T04:37:39.1112491Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp0z0kq5kd/_remote_module_non_scriptable.py 2022-05-18T04:37:39.3498092Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:37:39.4327762Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:37:39.4351985Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:37:39.4721780Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:37:45.0359597Z ok (8.667s) 2022-05-18T04:37:45.0359884Z 2022-05-18T04:37:45.0360316Z ---------------------------------------------------------------------- 2022-05-18T04:37:45.0360648Z Ran 1 test in 8.667s 2022-05-18T04:37:45.0360821Z 2022-05-18T04:37:45.0361287Z OK 2022-05-18T04:37:45.0361564Z 2022-05-18T04:37:45.0361743Z Generating XML reports... 2022-05-18T04:37:45.0405597Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518043736.xml 2022-05-18T04:37:46.2004522Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpf8z8h98l 2022-05-18T04:37:46.2005969Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpf8z8h98l/_remote_module_non_scriptable.py 2022-05-18T04:37:46.5657151Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:37:46.5672129Z 2022-05-18T04:37:46.5672525Z Running tests... 2022-05-18T04:37:46.5672958Z ---------------------------------------------------------------------- 2022-05-18T04:37:48.1840735Z test_device_maps_backward_pass (__main__.TensorPipeTensorPipeCudaDistAutogradTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:37:48.2490225Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 31085 2022-05-18T04:37:48.2598248Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 31086 2022-05-18T04:37:48.2707731Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 31087 2022-05-18T04:37:48.2817723Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 31088 2022-05-18T04:37:49.1598639Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp2g7he6j8 2022-05-18T04:37:49.1599613Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp2g7he6j8/_remote_module_non_scriptable.py 2022-05-18T04:37:49.1625252Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpflb4lj73 2022-05-18T04:37:49.1627926Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpflb4lj73/_remote_module_non_scriptable.py 2022-05-18T04:37:49.1830420Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpvsthdpng 2022-05-18T04:37:49.1832839Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpvsthdpng/_remote_module_non_scriptable.py 2022-05-18T04:37:49.2155679Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpfzr1vsa8 2022-05-18T04:37:49.2158385Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpfzr1vsa8/_remote_module_non_scriptable.py 2022-05-18T04:37:49.5144127Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:37:49.5301943Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:37:49.5448332Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:37:49.5769125Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:37:49.7872294Z skip: Need at least 4 CUDA devices (3.220s) 2022-05-18T04:37:49.7872559Z 2022-05-18T04:37:49.7873053Z ---------------------------------------------------------------------- 2022-05-18T04:37:49.7873637Z Ran 1 test in 3.220s 2022-05-18T04:37:49.7873811Z 2022-05-18T04:37:49.7873931Z OK (skipped=1) 2022-05-18T04:37:49.7874597Z 2022-05-18T04:37:49.7874758Z Generating XML reports... 2022-05-18T04:37:49.7917620Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeCudaDistAutogradTest-20220518043746.xml 2022-05-18T04:37:50.9517624Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpjq_3el5n 2022-05-18T04:37:50.9518718Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpjq_3el5n/_remote_module_non_scriptable.py 2022-05-18T04:37:51.3077884Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:37:51.3092274Z 2022-05-18T04:37:51.3092678Z Running tests... 2022-05-18T04:37:51.3093180Z ---------------------------------------------------------------------- 2022-05-18T04:37:52.9192669Z test_dist_autograd_sync_streams (__main__.TensorPipeTensorPipeCudaDistAutogradTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:37:52.9832014Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 31266 2022-05-18T04:37:52.9938913Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 31267 2022-05-18T04:37:53.0044859Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 31268 2022-05-18T04:37:53.0152418Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 31269 2022-05-18T04:37:53.9778014Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpgc5sdly6 2022-05-18T04:37:53.9779137Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpgc5sdly6/_remote_module_non_scriptable.py 2022-05-18T04:37:53.9824277Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmplotvy16s 2022-05-18T04:37:53.9826885Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmplotvy16s/_remote_module_non_scriptable.py 2022-05-18T04:37:53.9836989Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp3zedfklz 2022-05-18T04:37:53.9839766Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp3zedfklz/_remote_module_non_scriptable.py 2022-05-18T04:37:54.0151302Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpsvhb9ggd 2022-05-18T04:37:54.0153734Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpsvhb9ggd/_remote_module_non_scriptable.py 2022-05-18T04:37:54.3369370Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:37:54.3376292Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:37:54.3415752Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:37:54.3707728Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:37:54.6210910Z skip: Need at least 4 CUDA devices (3.311s) 2022-05-18T04:37:54.6211563Z 2022-05-18T04:37:54.6211982Z ---------------------------------------------------------------------- 2022-05-18T04:37:54.6212346Z Ran 1 test in 3.312s 2022-05-18T04:37:54.6212518Z 2022-05-18T04:37:54.6212636Z OK (skipped=1) 2022-05-18T04:37:54.6212801Z 2022-05-18T04:37:54.6212912Z Generating XML reports... 2022-05-18T04:37:54.6260711Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeCudaDistAutogradTest-20220518043751.xml 2022-05-18T04:37:55.8030395Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp77_zkp9g 2022-05-18T04:37:55.8031608Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp77_zkp9g/_remote_module_non_scriptable.py 2022-05-18T04:37:56.1802989Z Test results will be stored in test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent 2022-05-18T04:37:56.1818527Z 2022-05-18T04:37:56.1818775Z Running tests... 2022-05-18T04:37:56.1819298Z ---------------------------------------------------------------------- 2022-05-18T04:37:57.8363568Z test_gradients_synchronizations (__main__.TensorPipeTensorPipeCudaDistAutogradTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:37:57.9015283Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 31447 2022-05-18T04:37:57.9121535Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 31448 2022-05-18T04:37:57.9231356Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 31449 2022-05-18T04:37:57.9342853Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 31450 2022-05-18T04:37:58.8439896Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpo7hgqvnm 2022-05-18T04:37:58.8440738Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpo7hgqvnm/_remote_module_non_scriptable.py 2022-05-18T04:37:58.8702072Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpr18fc7_1 2022-05-18T04:37:58.8706175Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpr18fc7_1/_remote_module_non_scriptable.py 2022-05-18T04:37:58.8778860Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpjod3co9u 2022-05-18T04:37:58.8781659Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpjod3co9u/_remote_module_non_scriptable.py 2022-05-18T04:37:58.8970845Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp07s6ieib 2022-05-18T04:37:58.8973757Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp07s6ieib/_remote_module_non_scriptable.py 2022-05-18T04:37:59.2012634Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:37:59.2329980Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:37:59.2353931Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T04:37:59.2656843Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:37:59.4399824Z skip: Need at least 4 CUDA devices (3.258s) 2022-05-18T04:37:59.4400087Z 2022-05-18T04:37:59.4400486Z ---------------------------------------------------------------------- 2022-05-18T04:37:59.4400816Z Ran 1 test in 3.258s 2022-05-18T04:37:59.4400983Z 2022-05-18T04:37:59.4401096Z OK (skipped=1) 2022-05-18T04:37:59.4401257Z 2022-05-18T04:37:59.4401401Z Generating XML reports... 2022-05-18T04:37:59.4444706Z Generated XML report: test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeCudaDistAutogradTest-20220518043756.xml 2022-05-18T04:37:59.8589187Z Running distributed/fsdp/test_fsdp_core ... [2022-05-18 04:37:59.858388] 2022-05-18T04:37:59.8589930Z Executing ['/opt/conda/bin/python', 'distributed/fsdp/test_fsdp_core.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2022-05-18 04:37:59.858501] 2022-05-18T04:38:00.7800744Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpvv90yn5s 2022-05-18T04:38:00.7801647Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpvv90yn5s/_remote_module_non_scriptable.py 2022-05-18T04:38:00.8086657Z Test results will be stored in test-reports/python-unittest/distributed.fsdp.test_fsdp_core 2022-05-18T04:38:00.8131711Z 2022-05-18T04:38:00.8131983Z Running tests... 2022-05-18T04:38:00.8132401Z ---------------------------------------------------------------------- 2022-05-18T04:38:02.4605518Z test_backward_hooks_after_save (__main__.TestHooks) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:38:02.4963930Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 31628 2022-05-18T04:38:02.5073777Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 31629 2022-05-18T04:38:03.4327823Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpil5kcd4m 2022-05-18T04:38:03.4328970Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpil5kcd4m/_remote_module_non_scriptable.py 2022-05-18T04:38:03.4558370Z dist init r=0, world=2 2022-05-18T04:38:03.4562888Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:38:03.4734506Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp_1i4h262 2022-05-18T04:38:03.4737216Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp_1i4h262/_remote_module_non_scriptable.py 2022-05-18T04:38:03.4962452Z dist init r=1, world=2 2022-05-18T04:38:03.4966947Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:38:03.4968261Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:38:03.4971677Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:38:04.9226919Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:38:04.9227483Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:38:04.9572946Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:38:04.9573685Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:38:04.9574547Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:38:04.9575172Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:38:05.9166290Z ok (5.103s) 2022-05-18T04:38:05.9295013Z test_output_backward_hooks_cuda_first_False (__main__.TestHooks) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 31715 2022-05-18T04:38:05.9402183Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 31716 2022-05-18T04:38:06.8408355Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpscf04ddq 2022-05-18T04:38:06.8409277Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpscf04ddq/_remote_module_non_scriptable.py 2022-05-18T04:38:06.8551345Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpp4jwlajm 2022-05-18T04:38:06.8554171Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpp4jwlajm/_remote_module_non_scriptable.py 2022-05-18T04:38:06.8638762Z dist init r=0, world=2 2022-05-18T04:38:06.8643313Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:38:06.8772785Z dist init r=1, world=2 2022-05-18T04:38:06.8777123Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:38:06.8778291Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:38:06.8848786Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:38:08.2901208Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:38:08.2901760Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:38:08.3214202Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:38:08.3214966Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:38:08.3215998Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:38:08.3216637Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:38:09.2488891Z ok (3.332s) 2022-05-18T04:38:09.2612821Z test_output_backward_hooks_cuda_first_True (__main__.TestHooks) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 31802 2022-05-18T04:38:09.2717290Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 31803 2022-05-18T04:38:10.1646612Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpuq7o4y1e 2022-05-18T04:38:10.1647636Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpuq7o4y1e/_remote_module_non_scriptable.py 2022-05-18T04:38:10.1698072Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmphkbssdoa 2022-05-18T04:38:10.1700786Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmphkbssdoa/_remote_module_non_scriptable.py 2022-05-18T04:38:10.1870121Z dist init r=0, world=2 2022-05-18T04:38:10.1874471Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:38:10.1930059Z dist init r=1, world=2 2022-05-18T04:38:10.1934657Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:38:10.1935712Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:38:10.1978335Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:38:11.5565907Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:38:11.5566440Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:38:12.4813919Z ok (3.232s) 2022-05-18T04:38:12.4831174Z test_register_functions_called_cuda_first_False_mixed_precision_False (__main__.TestHooks) 2022-05-18T04:38:12.4949024Z Tests that _register_{pre|post}_backward_hooks called during forward. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 31889 2022-05-18T04:38:12.5056187Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 31890 2022-05-18T04:38:13.4037197Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpnzh5bvd4 2022-05-18T04:38:13.4038355Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpnzh5bvd4/_remote_module_non_scriptable.py 2022-05-18T04:38:13.4183035Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmplwennbiq 2022-05-18T04:38:13.4185822Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmplwennbiq/_remote_module_non_scriptable.py 2022-05-18T04:38:13.4257138Z dist init r=0, world=2 2022-05-18T04:38:13.4261505Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:38:13.4414404Z dist init r=1, world=2 2022-05-18T04:38:13.4418769Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:38:13.4419921Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:38:13.4466658Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:38:14.8221110Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:38:14.8221665Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:38:14.8531614Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:38:14.8532294Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:38:14.8535474Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:38:14.8536446Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:38:15.7153804Z ok (3.234s) 2022-05-18T04:38:15.7167341Z test_register_functions_called_cuda_first_False_mixed_precision_True (__main__.TestHooks) 2022-05-18T04:38:15.7282731Z Tests that _register_{pre|post}_backward_hooks called during forward. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 31972 2022-05-18T04:38:15.7387021Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 31973 2022-05-18T04:38:16.6430296Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpnrtaajg2 2022-05-18T04:38:16.6431352Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpnrtaajg2/_remote_module_non_scriptable.py 2022-05-18T04:38:16.6659478Z dist init r=0, world=2 2022-05-18T04:38:16.6663795Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:38:16.6748571Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpps5_n10j 2022-05-18T04:38:16.6751068Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpps5_n10j/_remote_module_non_scriptable.py 2022-05-18T04:38:16.6968257Z dist init r=1, world=2 2022-05-18T04:38:16.6972635Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:38:16.6973731Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:38:16.7072517Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:38:18.0883186Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:38:18.0883726Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:38:18.1175796Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:38:18.1176471Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:38:18.1177339Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:38:18.1177980Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:38:19.0475557Z ok (3.332s) 2022-05-18T04:38:19.0488786Z test_register_functions_called_cuda_first_True_mixed_precision_False (__main__.TestHooks) 2022-05-18T04:38:19.0604227Z Tests that _register_{pre|post}_backward_hooks called during forward. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 32055 2022-05-18T04:38:19.0708505Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 32056 2022-05-18T04:38:20.0205835Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpk4ggy_cg 2022-05-18T04:38:20.0206692Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpk4ggy_cg/_remote_module_non_scriptable.py 2022-05-18T04:38:20.0235007Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpzrwtkzj9 2022-05-18T04:38:20.0237538Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpzrwtkzj9/_remote_module_non_scriptable.py 2022-05-18T04:38:20.0431931Z dist init r=0, world=2 2022-05-18T04:38:20.0435220Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:38:20.0465786Z dist init r=1, world=2 2022-05-18T04:38:20.0470085Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:38:20.0471500Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:38:20.0538860Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:38:21.4166565Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:38:21.4167150Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:38:22.3793339Z ok (3.332s) 2022-05-18T04:38:22.3807390Z test_register_functions_called_cuda_first_True_mixed_precision_True (__main__.TestHooks) 2022-05-18T04:38:22.3925468Z Tests that _register_{pre|post}_backward_hooks called during forward. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 32138 2022-05-18T04:38:22.4032033Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 32139 2022-05-18T04:38:23.2980014Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpa2jqg6sk 2022-05-18T04:38:23.2981698Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpa2jqg6sk/_remote_module_non_scriptable.py 2022-05-18T04:38:23.3208363Z dist init r=1, world=2 2022-05-18T04:38:23.3213321Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:38:23.3270509Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp0q9g2pyc 2022-05-18T04:38:23.3273127Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp0q9g2pyc/_remote_module_non_scriptable.py 2022-05-18T04:38:23.3489884Z dist init r=0, world=2 2022-05-18T04:38:23.3494294Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:38:23.3495414Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:38:23.3520081Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:38:24.7226507Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:38:24.7227043Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:38:25.7119724Z ok (3.332s) 2022-05-18T04:38:25.7257368Z test_transformer_no_grad_mixed_precision_False (__main__.TestNoGrad) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 32221 2022-05-18T04:38:25.7368935Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 32222 2022-05-18T04:38:26.6376721Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmph100y9l4 2022-05-18T04:38:26.6378013Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmph100y9l4/_remote_module_non_scriptable.py 2022-05-18T04:38:26.6478369Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpsts5tdp8 2022-05-18T04:38:26.6481209Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpsts5tdp8/_remote_module_non_scriptable.py 2022-05-18T04:38:26.6599503Z dist init r=1, world=2 2022-05-18T04:38:26.6603638Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:38:26.6712124Z dist init r=0, world=2 2022-05-18T04:38:26.6716692Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:38:26.6717940Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:38:26.6808916Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:38:28.0716628Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:38:28.0717531Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:38:28.1053922Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:38:28.1055390Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:38:28.1057063Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:38:28.1058262Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:38:29.0453877Z ok (3.333s) 2022-05-18T04:38:29.0587957Z test_transformer_no_grad_mixed_precision_True (__main__.TestNoGrad) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 32308 2022-05-18T04:38:29.0692196Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 32309 2022-05-18T04:38:29.9694061Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpsrg0ond0 2022-05-18T04:38:29.9695196Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpnbw0h92c 2022-05-18T04:38:29.9695758Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpsrg0ond0/_remote_module_non_scriptable.py 2022-05-18T04:38:29.9698711Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpnbw0h92c/_remote_module_non_scriptable.py 2022-05-18T04:38:29.9922262Z dist init r=1, world=2 2022-05-18T04:38:29.9925849Z dist init r=0, world=2 2022-05-18T04:38:29.9926235Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:38:29.9930237Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:38:29.9931728Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:38:30.0029966Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:38:31.3812417Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:38:31.3813350Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:38:31.4132443Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:38:31.4133119Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:38:31.4135452Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:38:31.4136117Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:38:32.0772355Z ok (3.032s) 2022-05-18T04:38:32.0907701Z test_param_change_after_init_mixed_precision_False (__main__.TestParamInit) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 32395 2022-05-18T04:38:32.1015916Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 32396 2022-05-18T04:38:33.0000893Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpw2wr1dme 2022-05-18T04:38:33.0002618Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpw2wr1dme/_remote_module_non_scriptable.py 2022-05-18T04:38:33.0232335Z dist init r=1, world=2 2022-05-18T04:38:33.0236892Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:38:33.0397154Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpmxuot1zk 2022-05-18T04:38:33.0399590Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpmxuot1zk/_remote_module_non_scriptable.py 2022-05-18T04:38:33.0621655Z dist init r=0, world=2 2022-05-18T04:38:33.0626160Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:38:33.0627102Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:38:33.0645192Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:38:34.4367569Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:38:34.4368122Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:38:34.4696727Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:38:34.4697408Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:38:34.4698260Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:38:34.4698871Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:38:35.3101376Z ok (3.233s) 2022-05-18T04:38:35.3230653Z test_param_change_after_init_mixed_precision_True (__main__.TestParamInit) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 32478 2022-05-18T04:38:35.3334371Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 32479 2022-05-18T04:38:36.2287184Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpsw6l0jsn 2022-05-18T04:38:36.2288176Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpsw6l0jsn/_remote_module_non_scriptable.py 2022-05-18T04:38:36.2310095Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpnne9s74y 2022-05-18T04:38:36.2312937Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpnne9s74y/_remote_module_non_scriptable.py 2022-05-18T04:38:36.2505830Z dist init r=1, world=2 2022-05-18T04:38:36.2510044Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:38:36.2541796Z dist init r=0, world=2 2022-05-18T04:38:36.2546383Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:38:36.2547556Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:38:36.2613597Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:38:37.6396291Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:38:37.6396848Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:38:37.6692344Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:38:37.6693018Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:38:37.6725781Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:38:37.6726697Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:38:38.5418929Z ok (3.232s) 2022-05-18T04:38:38.5547277Z test_delayed_optim_step_offload_false_none_no_shard (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 32561 2022-05-18T04:38:38.5654757Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 32562 2022-05-18T04:38:39.4595533Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpazbgfje3 2022-05-18T04:38:39.4596451Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpazbgfje3/_remote_module_non_scriptable.py 2022-05-18T04:38:39.4816706Z dist init r=1, world=2 2022-05-18T04:38:39.4820892Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:38:39.5105306Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp072qmmo4 2022-05-18T04:38:39.5108310Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp072qmmo4/_remote_module_non_scriptable.py 2022-05-18T04:38:39.5331899Z dist init r=0, world=2 2022-05-18T04:38:39.5336632Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:38:39.5337902Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:38:39.5432843Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:38:40.9053301Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:38:40.9053822Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:38:41.4746005Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:38:41.4746570Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:38:41.4779708Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:38:41.4780376Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:38:41.4781231Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:38:41.4781871Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:38:42.0614553Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:38:42.0615391Z warnings.warn(msg, FutureWarning) 2022-05-18T04:38:42.0616323Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:38:42.0616996Z warnings.warn(msg, FutureWarning) 2022-05-18T04:38:42.2144711Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:38:42.2145224Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:38:42.8759243Z ok (4.334s) 2022-05-18T04:38:42.8885658Z test_delayed_optim_step_offload_false_none_none (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 32648 2022-05-18T04:38:42.8989576Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 32649 2022-05-18T04:38:43.7968790Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp2h258sev 2022-05-18T04:38:43.7970209Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp2h258sev/_remote_module_non_scriptable.py 2022-05-18T04:38:43.8028544Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpxrgufhvv 2022-05-18T04:38:43.8031520Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpxrgufhvv/_remote_module_non_scriptable.py 2022-05-18T04:38:43.8195165Z dist init r=0, world=2 2022-05-18T04:38:43.8199611Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:38:43.8259492Z dist init r=1, world=2 2022-05-18T04:38:43.8263830Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:38:43.8265412Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:38:43.8304257Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:38:45.2037981Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:38:45.2038960Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:38:45.7710538Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:38:45.7711529Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:38:45.7746312Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:38:45.7747969Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:38:45.7749917Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:38:45.7751338Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:38:46.3676311Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:38:46.3677718Z warnings.warn(msg, FutureWarning) 2022-05-18T04:38:46.3679658Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:38:46.3680874Z warnings.warn(msg, FutureWarning) 2022-05-18T04:38:46.5222669Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:38:46.5223652Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:38:47.3096631Z ok (4.434s) 2022-05-18T04:38:47.3221429Z test_delayed_optim_step_offload_false_none_shard_grad_op (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 32735 2022-05-18T04:38:47.3325349Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 32736 2022-05-18T04:38:48.2609616Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp1y0okebj 2022-05-18T04:38:48.2611098Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp1y0okebj/_remote_module_non_scriptable.py 2022-05-18T04:38:48.2836927Z dist init r=0, world=2 2022-05-18T04:38:48.2841218Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:38:48.3083505Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp69hsadsk 2022-05-18T04:38:48.3086196Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp69hsadsk/_remote_module_non_scriptable.py 2022-05-18T04:38:48.3312091Z dist init r=1, world=2 2022-05-18T04:38:48.3316442Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:38:48.3317615Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:38:48.3351169Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:38:49.6929731Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:38:49.6930603Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:38:50.2599444Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:38:50.2599986Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:38:50.2633035Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:38:50.2633715Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:38:50.2634584Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:38:50.2635238Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:38:51.0460207Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:38:51.0460931Z warnings.warn(msg, FutureWarning) 2022-05-18T04:38:51.0461872Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:38:51.0462538Z warnings.warn(msg, FutureWarning) 2022-05-18T04:38:51.3019527Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:38:51.3020048Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:38:52.3441395Z ok (5.034s) 2022-05-18T04:38:52.3565800Z test_delayed_optim_step_offload_false_prefetch_post_no_shard (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 32822 2022-05-18T04:38:52.3669630Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 32823 2022-05-18T04:38:53.2623200Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpvwusymld 2022-05-18T04:38:53.2624278Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpvwusymld/_remote_module_non_scriptable.py 2022-05-18T04:38:53.2633711Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpwf8guztr 2022-05-18T04:38:53.2636616Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpwf8guztr/_remote_module_non_scriptable.py 2022-05-18T04:38:53.2842892Z dist init r=0, world=2 2022-05-18T04:38:53.2847021Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:38:53.2863313Z dist init r=1, world=2 2022-05-18T04:38:53.2867702Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:38:53.2868567Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:38:53.2950596Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:38:54.6729297Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:38:54.6729885Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:38:55.2400819Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:38:55.2401362Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:38:55.2435824Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:38:55.2436516Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:38:55.2437365Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:38:55.2437988Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:38:56.0167566Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:38:56.0168486Z warnings.warn(msg, FutureWarning) 2022-05-18T04:38:56.0171075Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:38:56.0171769Z warnings.warn(msg, FutureWarning) 2022-05-18T04:38:56.2730329Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:38:56.2730821Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:38:57.3788439Z ok (5.035s) 2022-05-18T04:38:57.3917135Z test_delayed_optim_step_offload_false_prefetch_post_none (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 32909 2022-05-18T04:38:57.4025311Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 32910 2022-05-18T04:38:58.2969270Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpmxzklhwg 2022-05-18T04:38:58.2970497Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpmxzklhwg/_remote_module_non_scriptable.py 2022-05-18T04:38:58.3200389Z dist init r=1, world=2 2022-05-18T04:38:58.3204618Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:38:58.3267097Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpf2za85f5 2022-05-18T04:38:58.3269692Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpf2za85f5/_remote_module_non_scriptable.py 2022-05-18T04:38:58.3487180Z dist init r=0, world=2 2022-05-18T04:38:58.3491344Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:38:58.3492568Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:38:58.3511708Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:38:59.7250929Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:38:59.7251453Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:39:00.2925350Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:39:00.2925848Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:39:00.2959639Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:39:00.2960322Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:39:00.2961177Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:39:00.2961817Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:39:01.0802822Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:39:01.0803611Z warnings.warn(msg, FutureWarning) 2022-05-18T04:39:01.0805325Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:39:01.0806002Z warnings.warn(msg, FutureWarning) 2022-05-18T04:39:01.3365295Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:39:01.3365792Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:39:02.4141273Z ok (5.035s) 2022-05-18T04:39:02.4269009Z test_delayed_optim_step_offload_false_prefetch_post_shard_grad_op (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 32996 2022-05-18T04:39:02.4375123Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 32997 2022-05-18T04:39:03.3559834Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp5z9fvose 2022-05-18T04:39:03.3560958Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp5z9fvose/_remote_module_non_scriptable.py 2022-05-18T04:39:03.3564449Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp5lov_dss 2022-05-18T04:39:03.3567862Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp5lov_dss/_remote_module_non_scriptable.py 2022-05-18T04:39:03.3788239Z dist init r=0, world=2 2022-05-18T04:39:03.3792718Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:39:03.3793959Z dist init r=1, world=2 2022-05-18T04:39:03.3798713Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:39:03.3799535Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:39:03.3896252Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:39:04.7744908Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:39:04.7745450Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:39:05.3402319Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:39:05.3402881Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:39:05.3436509Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:39:05.3437191Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:39:05.3438038Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:39:05.3438676Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:39:06.1279870Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:39:06.1280626Z warnings.warn(msg, FutureWarning) 2022-05-18T04:39:06.1281580Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:39:06.1282240Z warnings.warn(msg, FutureWarning) 2022-05-18T04:39:06.3843754Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:39:06.3844760Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:39:07.4493560Z ok (5.035s) 2022-05-18T04:39:07.4619760Z test_delayed_optim_step_offload_false_prefetch_pre_no_shard (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 33083 2022-05-18T04:39:07.4724230Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 33084 2022-05-18T04:39:08.3731308Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpsolrgqe7 2022-05-18T04:39:08.3732616Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpsolrgqe7/_remote_module_non_scriptable.py 2022-05-18T04:39:08.3735404Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmplstw2fr2 2022-05-18T04:39:08.3738150Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmplstw2fr2/_remote_module_non_scriptable.py 2022-05-18T04:39:08.3956641Z dist init r=0, world=2 2022-05-18T04:39:08.3960754Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:39:08.3963799Z dist init r=1, world=2 2022-05-18T04:39:08.3968719Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:39:08.3969518Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:39:08.4064318Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:39:09.7716110Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:39:09.7716983Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:39:10.3353893Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:39:10.3354430Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:39:10.3388697Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:39:10.3389418Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:39:10.3390275Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:39:10.3390910Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:39:11.1115603Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:39:11.1116535Z warnings.warn(msg, FutureWarning) 2022-05-18T04:39:11.1117657Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:39:11.1118329Z warnings.warn(msg, FutureWarning) 2022-05-18T04:39:11.3677459Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:39:11.3677994Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:39:12.4841559Z ok (5.035s) 2022-05-18T04:39:12.4967574Z test_delayed_optim_step_offload_false_prefetch_pre_none (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 33170 2022-05-18T04:39:12.5071525Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 33171 2022-05-18T04:39:13.4069562Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp16gpxq29 2022-05-18T04:39:13.4070722Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp16gpxq29/_remote_module_non_scriptable.py 2022-05-18T04:39:13.4300821Z dist init r=0, world=2 2022-05-18T04:39:13.4305245Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:39:13.4364244Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpe5xaa0pl 2022-05-18T04:39:13.4366690Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpe5xaa0pl/_remote_module_non_scriptable.py 2022-05-18T04:39:13.4585413Z dist init r=1, world=2 2022-05-18T04:39:13.4589634Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:39:13.4590443Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:39:13.4612184Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:39:14.8594112Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:39:14.8594668Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:39:15.4180325Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:39:15.4180882Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:39:15.4215504Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:39:15.4216186Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:39:15.4217191Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:39:15.4217837Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:39:16.2061153Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:39:16.2061886Z warnings.warn(msg, FutureWarning) 2022-05-18T04:39:16.2064941Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:39:16.2065611Z warnings.warn(msg, FutureWarning) 2022-05-18T04:39:16.4623764Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:39:16.4624294Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:39:17.5190542Z ok (5.035s) 2022-05-18T04:39:17.5316426Z test_delayed_optim_step_offload_false_prefetch_pre_shard_grad_op (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 33257 2022-05-18T04:39:17.5422922Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 33258 2022-05-18T04:39:18.4471515Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp5qnrqgfl 2022-05-18T04:39:18.4472408Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp5qnrqgfl/_remote_module_non_scriptable.py 2022-05-18T04:39:18.4603673Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmplnd93awi 2022-05-18T04:39:18.4606842Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmplnd93awi/_remote_module_non_scriptable.py 2022-05-18T04:39:18.4694835Z dist init r=1, world=2 2022-05-18T04:39:18.4698792Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:39:18.4836884Z dist init r=0, world=2 2022-05-18T04:39:18.4841365Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:39:18.4842468Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:39:18.4904369Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:39:19.8532596Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:39:19.8533162Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:39:20.4192202Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:39:20.4192735Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:39:20.4226936Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:39:20.4227885Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:39:20.4228813Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:39:20.4229598Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:39:21.2063179Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:39:21.2064076Z warnings.warn(msg, FutureWarning) 2022-05-18T04:39:21.2066710Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:39:21.2067401Z warnings.warn(msg, FutureWarning) 2022-05-18T04:39:21.4624968Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:39:21.4625497Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:39:22.5539672Z ok (5.035s) 2022-05-18T04:39:22.5671830Z test_delayed_optim_step_offload_true_none_no_shard (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 33344 2022-05-18T04:39:22.5778045Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 33345 2022-05-18T04:39:23.4833581Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp2qlzi_ch 2022-05-18T04:39:23.4834802Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp2qlzi_ch/_remote_module_non_scriptable.py 2022-05-18T04:39:23.5064943Z dist init r=1, world=2 2022-05-18T04:39:23.5069505Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:39:23.5239448Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmptagozski 2022-05-18T04:39:23.5241974Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmptagozski/_remote_module_non_scriptable.py 2022-05-18T04:39:23.5461326Z dist init r=0, world=2 2022-05-18T04:39:23.5465650Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:39:23.5466689Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:39:23.5477841Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:39:24.9266803Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:39:24.9267399Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:39:25.4922122Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:39:25.4922657Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:39:25.4956734Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:39:25.4957418Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:39:25.4958268Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:39:25.4959143Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:39:25.7477662Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:39:25.7479229Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:39:25.7480504Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:39:25.7481774Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:39:25.7483037Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:39:25.7484294Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:39:25.7485557Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:39:25.7486824Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:39:25.9988990Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:39:25.9989521Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:39:26.7943293Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:39:26.7944003Z warnings.warn(msg, FutureWarning) 2022-05-18T04:39:26.7953122Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:39:26.7953796Z warnings.warn(msg, FutureWarning) 2022-05-18T04:39:27.0510812Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:39:27.0511488Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:39:28.1908848Z ok (5.637s) 2022-05-18T04:39:28.2037182Z test_delayed_optim_step_offload_true_none_none (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 33431 2022-05-18T04:39:28.2143426Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 33432 2022-05-18T04:39:29.1110899Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpltypy1ko 2022-05-18T04:39:29.1111816Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpltypy1ko/_remote_module_non_scriptable.py 2022-05-18T04:39:29.1122364Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpvhcey8m3 2022-05-18T04:39:29.1125471Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpvhcey8m3/_remote_module_non_scriptable.py 2022-05-18T04:39:29.1332169Z dist init r=0, world=2 2022-05-18T04:39:29.1336686Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:39:29.1354087Z dist init r=1, world=2 2022-05-18T04:39:29.1358825Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:39:29.1359831Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:39:29.1440307Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:39:30.5179058Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:39:30.5179605Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:39:31.0839675Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:39:31.0840229Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:39:31.0874359Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:39:31.0875120Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:39:31.0875986Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:39:31.0876611Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:39:31.3031724Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:39:31.3033064Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:39:31.3034355Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:39:31.3035629Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:39:31.3037201Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:39:31.3038486Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:39:31.3039747Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:39:31.3041015Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:39:31.5034358Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:39:31.5034877Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:39:32.1576569Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:39:32.1577270Z warnings.warn(msg, FutureWarning) 2022-05-18T04:39:32.1579807Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:39:32.1580476Z warnings.warn(msg, FutureWarning) 2022-05-18T04:39:32.3628564Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:39:32.3629057Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:39:33.3263232Z ok (5.135s) 2022-05-18T04:39:33.3391469Z test_delayed_optim_step_offload_true_none_shard_grad_op (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 33518 2022-05-18T04:39:33.3497283Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 33519 2022-05-18T04:39:34.2989287Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpcses0a9p 2022-05-18T04:39:34.2990447Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpcses0a9p/_remote_module_non_scriptable.py 2022-05-18T04:39:34.2994786Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpoyti_dez 2022-05-18T04:39:34.2997754Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpoyti_dez/_remote_module_non_scriptable.py 2022-05-18T04:39:34.3218736Z dist init r=1, world=2 2022-05-18T04:39:34.3223592Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:39:34.3227045Z dist init r=0, world=2 2022-05-18T04:39:34.3231745Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:39:34.3233042Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:39:34.3327368Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:39:35.7048082Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:39:35.7048921Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:39:36.2783289Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:39:36.2783840Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:39:36.2816896Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:39:36.2817567Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:39:36.2818428Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:39:36.2819069Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:39:36.5344020Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:39:36.5345634Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:39:36.5346921Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:39:36.5348191Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:39:36.5349446Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:39:36.5350718Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:39:36.5352012Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:39:36.5353247Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:39:36.7857546Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:39:36.7858195Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:39:37.5913163Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:39:37.5913871Z warnings.warn(msg, FutureWarning) 2022-05-18T04:39:37.5916927Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:39:37.5917604Z warnings.warn(msg, FutureWarning) 2022-05-18T04:39:37.8476295Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:39:37.8476818Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:39:38.9622731Z ok (5.636s) 2022-05-18T04:39:38.9750347Z test_delayed_optim_step_offload_true_prefetch_post_no_shard (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 33605 2022-05-18T04:39:38.9859965Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 33606 2022-05-18T04:39:39.8838022Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmphhonlg0o 2022-05-18T04:39:39.8838933Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmphhonlg0o/_remote_module_non_scriptable.py 2022-05-18T04:39:39.9057937Z dist init r=0, world=2 2022-05-18T04:39:39.9062174Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:39:39.9159877Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmppuhaf9bh 2022-05-18T04:39:39.9162324Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmppuhaf9bh/_remote_module_non_scriptable.py 2022-05-18T04:39:39.9380255Z dist init r=1, world=2 2022-05-18T04:39:39.9384415Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:39:39.9385445Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:39:39.9470568Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:39:41.3062175Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:39:41.3062750Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:39:41.8661968Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:39:41.8662502Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:39:41.8696010Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:39:41.8696693Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:39:41.8697526Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:39:41.8698387Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:39:42.1218881Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:39:42.1220226Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:39:42.1221495Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:39:42.1222781Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:39:42.1224042Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:39:42.1225295Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:39:42.1226569Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:39:42.1227827Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:39:42.3727309Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:39:42.3727851Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:39:43.1680779Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:39:43.1681469Z warnings.warn(msg, FutureWarning) 2022-05-18T04:39:43.1682411Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:39:43.1683075Z warnings.warn(msg, FutureWarning) 2022-05-18T04:39:43.4236198Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:39:43.4236929Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:39:44.4984612Z ok (5.536s) 2022-05-18T04:39:44.5109811Z test_delayed_optim_step_offload_true_prefetch_post_none (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 33692 2022-05-18T04:39:44.5213399Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 33693 2022-05-18T04:39:45.4329911Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpsjxhc56u 2022-05-18T04:39:45.4331322Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpsjxhc56u/_remote_module_non_scriptable.py 2022-05-18T04:39:45.4550136Z dist init r=0, world=2 2022-05-18T04:39:45.4554464Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:39:45.4685669Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmptvcywoj3 2022-05-18T04:39:45.4688195Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmptvcywoj3/_remote_module_non_scriptable.py 2022-05-18T04:39:45.4908122Z dist init r=1, world=2 2022-05-18T04:39:45.4912380Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:39:45.4913323Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:39:45.4962860Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:39:46.8455695Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:39:46.8456232Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:39:47.4107657Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:39:47.4108209Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:39:47.4141758Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:39:47.4142444Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:39:47.4143293Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:39:47.4143937Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:39:47.6665695Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:39:47.6667651Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:39:47.6668946Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:39:47.6670215Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:39:47.6671762Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:39:47.6673054Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:39:47.6674314Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:39:47.6675578Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:39:47.9179051Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:39:47.9179581Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:39:48.7250957Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:39:48.7252001Z warnings.warn(msg, FutureWarning) 2022-05-18T04:39:48.7253359Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:39:48.7254038Z warnings.warn(msg, FutureWarning) 2022-05-18T04:39:48.9812344Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:39:48.9812869Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:39:50.1351712Z ok (5.637s) 2022-05-18T04:39:50.1486145Z test_delayed_optim_step_offload_true_prefetch_post_shard_grad_op (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 33779 2022-05-18T04:39:50.1596309Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 33780 2022-05-18T04:39:51.0613194Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp2x4dslje 2022-05-18T04:39:51.0614081Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp2x4dslje/_remote_module_non_scriptable.py 2022-05-18T04:39:51.0630633Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpa7fgemjk 2022-05-18T04:39:51.0633406Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpa7fgemjk/_remote_module_non_scriptable.py 2022-05-18T04:39:51.0837956Z dist init r=0, world=2 2022-05-18T04:39:51.0842393Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:39:51.0862327Z dist init r=1, world=2 2022-05-18T04:39:51.0866794Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:39:51.0867916Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:39:51.0946052Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:39:52.4530011Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:39:52.4531113Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:39:53.0202371Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:39:53.0202894Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:39:53.0237083Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:39:53.0237758Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:39:53.0238604Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:39:53.0239245Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:39:53.2761421Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:39:53.2762728Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:39:53.2764007Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:39:53.2765277Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:39:53.2766546Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:39:53.2767817Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:39:53.2769080Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:39:53.2770599Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:39:53.5276096Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:39:53.5276752Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:39:54.3337409Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:39:54.3338118Z warnings.warn(msg, FutureWarning) 2022-05-18T04:39:54.3341210Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:39:54.3341898Z warnings.warn(msg, FutureWarning) 2022-05-18T04:39:54.5899856Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:39:54.5900356Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:39:55.6723835Z ok (5.537s) 2022-05-18T04:39:55.6848804Z test_delayed_optim_step_offload_true_prefetch_pre_no_shard (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 33866 2022-05-18T04:39:55.6952816Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 33867 2022-05-18T04:39:56.5938432Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpj5vtfx3e 2022-05-18T04:39:56.5939436Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpj5vtfx3e/_remote_module_non_scriptable.py 2022-05-18T04:39:56.6169330Z dist init r=1, world=2 2022-05-18T04:39:56.6174288Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:39:56.6254230Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpvr0rb708 2022-05-18T04:39:56.6257187Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpvr0rb708/_remote_module_non_scriptable.py 2022-05-18T04:39:56.6480777Z dist init r=0, world=2 2022-05-18T04:39:56.6485037Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:39:56.6486170Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:39:56.6583453Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:39:58.0329760Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:39:58.0330319Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:39:58.6005729Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:39:58.6006285Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:39:58.6040564Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:39:58.6041234Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:39:58.6042069Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:39:58.6042992Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:39:58.8562271Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:39:58.8563624Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:39:58.8564928Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:39:58.8566219Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:39:58.8567473Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:39:58.8568735Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:39:58.8569999Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:39:58.8571534Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:39:59.1073302Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:39:59.1073830Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:39:59.9043048Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:39:59.9044042Z warnings.warn(msg, FutureWarning) 2022-05-18T04:39:59.9045686Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:39:59.9046452Z warnings.warn(msg, FutureWarning) 2022-05-18T04:40:00.1604083Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:40:00.1604768Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:40:01.3084283Z ok (5.636s) 2022-05-18T04:40:01.3210745Z test_delayed_optim_step_offload_true_prefetch_pre_none (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 33953 2022-05-18T04:40:01.3317739Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 33954 2022-05-18T04:40:02.2486758Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp2oatykoc 2022-05-18T04:40:02.2487891Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp2oatykoc/_remote_module_non_scriptable.py 2022-05-18T04:40:02.2715242Z dist init r=0, world=2 2022-05-18T04:40:02.2719772Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:40:02.2870629Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp8z791_hy 2022-05-18T04:40:02.2873393Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp8z791_hy/_remote_module_non_scriptable.py 2022-05-18T04:40:02.3092516Z dist init r=1, world=2 2022-05-18T04:40:02.3097145Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:40:02.3097958Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:40:02.3127983Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:40:03.6612266Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:40:03.6612843Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:40:04.2225607Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:40:04.2226182Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:40:04.2259102Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:40:04.2259793Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:40:04.2260618Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:40:04.2261257Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:40:04.4784027Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:40:04.4785364Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:40:04.4786657Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:40:04.4787925Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:40:04.4789493Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:40:04.4790773Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:40:04.4792023Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:40:04.4793288Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:40:04.7292961Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:40:04.7293492Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:40:05.5354776Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:40:05.5355492Z warnings.warn(msg, FutureWarning) 2022-05-18T04:40:05.5356438Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:40:05.5357095Z warnings.warn(msg, FutureWarning) 2022-05-18T04:40:05.7908711Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:40:05.7909237Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:40:06.8442248Z ok (5.536s) 2022-05-18T04:40:06.8568869Z test_delayed_optim_step_offload_true_prefetch_pre_shard_grad_op (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 34040 2022-05-18T04:40:06.8674362Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 34041 2022-05-18T04:40:07.7709131Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpwu7b1l1t 2022-05-18T04:40:07.7710012Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpwu7b1l1t/_remote_module_non_scriptable.py 2022-05-18T04:40:07.7845925Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp019i1doe 2022-05-18T04:40:07.7848721Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp019i1doe/_remote_module_non_scriptable.py 2022-05-18T04:40:07.7931421Z dist init r=0, world=2 2022-05-18T04:40:07.7935917Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:40:07.8078224Z dist init r=1, world=2 2022-05-18T04:40:07.8082590Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:40:07.8083586Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:40:07.8141253Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:40:09.1779755Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:40:09.1780311Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:40:09.7430598Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:40:09.7431141Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:40:09.7464693Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:40:09.7465386Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:40:09.7466241Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:40:09.7466894Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:40:09.9987971Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:40:09.9989287Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:40:09.9990591Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:40:09.9992155Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:40:09.9993437Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:40:09.9994712Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:40:09.9995972Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:40:09.9997233Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:40:10.2499725Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:40:10.2500411Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:40:11.0571179Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:40:11.0572123Z warnings.warn(msg, FutureWarning) 2022-05-18T04:40:11.0573488Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:40:11.0574182Z warnings.warn(msg, FutureWarning) 2022-05-18T04:40:11.3130738Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:40:11.3131272Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:40:12.4803765Z ok (5.636s) 2022-05-18T04:40:12.4936968Z test_delayed_reduce_scatter_offload_false_none_no_shard (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 34127 2022-05-18T04:40:12.5046380Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 34128 2022-05-18T04:40:13.4436497Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpflqxu6b9 2022-05-18T04:40:13.4437759Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpflqxu6b9/_remote_module_non_scriptable.py 2022-05-18T04:40:13.4620361Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpluqmexh9 2022-05-18T04:40:13.4623215Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpluqmexh9/_remote_module_non_scriptable.py 2022-05-18T04:40:13.4662513Z dist init r=0, world=2 2022-05-18T04:40:13.4666683Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:40:13.4849975Z dist init r=1, world=2 2022-05-18T04:40:13.4854614Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:40:13.4855717Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:40:13.4871799Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:40:14.8531793Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:40:14.8532324Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:40:15.1659299Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:40:15.1669778Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:40:15.1692106Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:40:15.1692790Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:40:15.1705617Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:40:15.1706586Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:40:15.2095416Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:40:15.2096135Z warnings.warn(msg, FutureWarning) 2022-05-18T04:40:15.2106175Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:40:15.2106853Z warnings.warn(msg, FutureWarning) 2022-05-18T04:40:15.2203377Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:40:15.2206234Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:40:15.5125593Z ok (3.032s) 2022-05-18T04:40:15.5251511Z test_delayed_reduce_scatter_offload_false_none_none (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 34214 2022-05-18T04:40:15.5356804Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 34215 2022-05-18T04:40:16.4507678Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpz_its5ct 2022-05-18T04:40:16.4508595Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpz_its5ct/_remote_module_non_scriptable.py 2022-05-18T04:40:16.4732795Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpf0ahkzv6 2022-05-18T04:40:16.4735551Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpf0ahkzv6/_remote_module_non_scriptable.py 2022-05-18T04:40:16.4738646Z dist init r=0, world=2 2022-05-18T04:40:16.4743367Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:40:16.4953307Z dist init r=1, world=2 2022-05-18T04:40:16.4957642Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:40:16.4958629Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:40:16.5050804Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:40:17.8893646Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:40:17.8894180Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:40:18.1970759Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:40:18.1971484Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:40:18.2004020Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:40:18.2004701Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:40:18.2005540Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:40:18.2006171Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:40:19.6602704Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:40:19.6603619Z warnings.warn(msg, FutureWarning) 2022-05-18T04:40:19.6606471Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:40:19.6607180Z warnings.warn(msg, FutureWarning) 2022-05-18T04:40:19.6705307Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:40:19.6705808Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:40:21.1489257Z ok (5.636s) 2022-05-18T04:40:21.1615742Z test_delayed_reduce_scatter_offload_false_none_shard_grad_op (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 34301 2022-05-18T04:40:21.1722713Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 34302 2022-05-18T04:40:22.0557967Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpxsz8mro8 2022-05-18T04:40:22.0559293Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpxsz8mro8/_remote_module_non_scriptable.py 2022-05-18T04:40:22.0779456Z dist init r=0, world=2 2022-05-18T04:40:22.0783132Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:40:22.1126640Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpc1obu3by 2022-05-18T04:40:22.1129295Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpc1obu3by/_remote_module_non_scriptable.py 2022-05-18T04:40:22.1347094Z dist init r=1, world=2 2022-05-18T04:40:22.1351430Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:40:22.1352551Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:40:22.1394881Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:40:23.4960064Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:40:23.4960604Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:40:23.8008685Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:40:23.8009211Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:40:23.8042007Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:40:23.8042667Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:40:23.8043531Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:40:23.8044167Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:40:25.8417002Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:40:25.8417702Z warnings.warn(msg, FutureWarning) 2022-05-18T04:40:25.8419035Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:40:25.8419688Z warnings.warn(msg, FutureWarning) 2022-05-18T04:40:25.8514835Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:40:25.8515352Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:40:28.1891820Z ok (7.040s) 2022-05-18T04:40:28.2020585Z test_delayed_reduce_scatter_offload_false_prefetch_post_no_shard (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 34388 2022-05-18T04:40:28.2127135Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 34389 2022-05-18T04:40:29.1174098Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpiq8dyuad 2022-05-18T04:40:29.1175493Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpiq8dyuad/_remote_module_non_scriptable.py 2022-05-18T04:40:29.1182031Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmplboi25gl 2022-05-18T04:40:29.1184931Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmplboi25gl/_remote_module_non_scriptable.py 2022-05-18T04:40:29.1395132Z dist init r=1, world=2 2022-05-18T04:40:29.1399168Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:40:29.1412089Z dist init r=0, world=2 2022-05-18T04:40:29.1416534Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:40:29.1417801Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:40:29.1502562Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:40:30.5299630Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:40:30.5300167Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:40:30.8421875Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:40:30.8422414Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:40:30.8455322Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:40:30.8455972Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:40:30.8456972Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:40:30.8457616Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:40:30.8843420Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:40:30.8844111Z warnings.warn(msg, FutureWarning) 2022-05-18T04:40:30.8849804Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:40:30.8851027Z warnings.warn(msg, FutureWarning) 2022-05-18T04:40:30.8952926Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:40:30.8953430Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:40:31.2207899Z ok (3.031s) 2022-05-18T04:40:31.2334401Z test_delayed_reduce_scatter_offload_false_prefetch_post_none (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 34475 2022-05-18T04:40:31.2437794Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 34476 2022-05-18T04:40:32.1466327Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp8_p8q3sc 2022-05-18T04:40:32.1468117Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp8_p8q3sc/_remote_module_non_scriptable.py 2022-05-18T04:40:32.1672126Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpo1rf98sr 2022-05-18T04:40:32.1674830Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpo1rf98sr/_remote_module_non_scriptable.py 2022-05-18T04:40:32.1699135Z dist init r=1, world=2 2022-05-18T04:40:32.1703728Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:40:32.1893713Z dist init r=0, world=2 2022-05-18T04:40:32.1897615Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:40:32.1898780Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:40:32.1908879Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:40:33.5673408Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:40:33.5673968Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:40:33.8809800Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:40:33.8820286Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:40:33.8843194Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:40:33.8843866Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:40:33.8856250Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:40:33.8856893Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:40:35.9248942Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:40:35.9249645Z warnings.warn(msg, FutureWarning) 2022-05-18T04:40:35.9250956Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:40:35.9251639Z warnings.warn(msg, FutureWarning) 2022-05-18T04:40:35.9350288Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:40:35.9350806Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:40:38.2593688Z ok (7.038s) 2022-05-18T04:40:38.2724475Z test_delayed_reduce_scatter_offload_false_prefetch_post_shard_grad_op (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 34562 2022-05-18T04:40:38.2832660Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 34563 2022-05-18T04:40:39.2143232Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp0wtvz92m 2022-05-18T04:40:39.2144079Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp0wtvz92m/_remote_module_non_scriptable.py 2022-05-18T04:40:39.2173466Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp6zgvxk5_ 2022-05-18T04:40:39.2176299Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp6zgvxk5_/_remote_module_non_scriptable.py 2022-05-18T04:40:39.2371044Z dist init r=1, world=2 2022-05-18T04:40:39.2375680Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:40:39.2396442Z dist init r=0, world=2 2022-05-18T04:40:39.2400458Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:40:39.2401771Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:40:39.2479283Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:40:40.6041688Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:40:40.6092896Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:40:40.9131332Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:40:40.9141750Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:40:40.9165009Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:40:40.9165700Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:40:40.9177327Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:40:40.9177974Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:40:42.9555825Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:40:42.9556532Z warnings.warn(msg, FutureWarning) 2022-05-18T04:40:42.9559067Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:40:42.9559754Z warnings.warn(msg, FutureWarning) 2022-05-18T04:40:42.9655418Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:40:42.9656672Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:40:45.2990724Z ok (7.040s) 2022-05-18T04:40:45.3117667Z test_delayed_reduce_scatter_offload_false_prefetch_pre_no_shard (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 34649 2022-05-18T04:40:45.3224170Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 34650 2022-05-18T04:40:46.2185472Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpui5nae_r 2022-05-18T04:40:46.2186783Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpui5nae_r/_remote_module_non_scriptable.py 2022-05-18T04:40:46.2206533Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmphv1w02cx 2022-05-18T04:40:46.2209708Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmphv1w02cx/_remote_module_non_scriptable.py 2022-05-18T04:40:46.2404598Z dist init r=0, world=2 2022-05-18T04:40:46.2408517Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:40:46.2436833Z dist init r=1, world=2 2022-05-18T04:40:46.2441154Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:40:46.2442380Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:40:46.2512111Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:40:47.6118528Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:40:47.6119066Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:40:47.9247864Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:40:47.9257777Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:40:47.9281166Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:40:47.9281837Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:40:47.9292972Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:40:47.9293620Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:40:47.9675361Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:40:47.9676037Z warnings.warn(msg, FutureWarning) 2022-05-18T04:40:47.9685323Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:40:47.9686003Z warnings.warn(msg, FutureWarning) 2022-05-18T04:40:47.9782277Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:40:47.9782763Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:40:48.3304166Z ok (3.031s) 2022-05-18T04:40:48.3430620Z test_delayed_reduce_scatter_offload_false_prefetch_pre_none (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 34736 2022-05-18T04:40:48.3534332Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 34737 2022-05-18T04:40:49.2524172Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpai8y0sdm 2022-05-18T04:40:49.2525488Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpai8y0sdm/_remote_module_non_scriptable.py 2022-05-18T04:40:49.2755181Z dist init r=1, world=2 2022-05-18T04:40:49.2759428Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:40:49.2905942Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmps1mm1vzn 2022-05-18T04:40:49.2908611Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmps1mm1vzn/_remote_module_non_scriptable.py 2022-05-18T04:40:49.3125149Z dist init r=0, world=2 2022-05-18T04:40:49.3129268Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:40:49.3130416Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:40:49.3168057Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:40:50.6922237Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:40:50.6922793Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:40:51.0012921Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:40:51.0023607Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:40:51.0046820Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:40:51.0047600Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:40:51.0059193Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:40:51.0059874Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:40:53.0473322Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:40:53.0474089Z warnings.warn(msg, FutureWarning) 2022-05-18T04:40:53.0475054Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:40:53.0475691Z warnings.warn(msg, FutureWarning) 2022-05-18T04:40:53.0574900Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:40:53.0576269Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:40:55.3691652Z ok (7.039s) 2022-05-18T04:40:55.3819546Z test_delayed_reduce_scatter_offload_false_prefetch_pre_shard_grad_op (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 34823 2022-05-18T04:40:55.3925140Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 34824 2022-05-18T04:40:56.3010102Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpi8t15awt 2022-05-18T04:40:56.3011479Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpi8t15awt/_remote_module_non_scriptable.py 2022-05-18T04:40:56.3046970Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpvk12u1aq 2022-05-18T04:40:56.3049503Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpvk12u1aq/_remote_module_non_scriptable.py 2022-05-18T04:40:56.3236871Z dist init r=0, world=2 2022-05-18T04:40:56.3241354Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:40:56.3278074Z dist init r=1, world=2 2022-05-18T04:40:56.3282448Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:40:56.3283854Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:40:56.3344875Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:40:57.7169736Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:40:57.7170480Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:40:58.0251510Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:40:58.0252230Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:40:58.0285288Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:40:58.0285988Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:40:58.0286817Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:40:58.0287460Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:41:00.0656592Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:41:00.0657331Z warnings.warn(msg, FutureWarning) 2022-05-18T04:41:00.0658472Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:41:00.0659143Z warnings.warn(msg, FutureWarning) 2022-05-18T04:41:00.0755755Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:41:00.0756269Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:41:02.4081273Z ok (7.039s) 2022-05-18T04:41:02.4208563Z test_delayed_reduce_scatter_offload_true_none_no_shard (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 34910 2022-05-18T04:41:02.4314520Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 34911 2022-05-18T04:41:03.3757431Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmppxq_jhzt 2022-05-18T04:41:03.3758948Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmppxq_jhzt/_remote_module_non_scriptable.py 2022-05-18T04:41:03.3841004Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpyq888h40 2022-05-18T04:41:03.3843920Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpyq888h40/_remote_module_non_scriptable.py 2022-05-18T04:41:03.3979813Z dist init r=0, world=2 2022-05-18T04:41:03.3983933Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:41:03.4072827Z dist init r=1, world=2 2022-05-18T04:41:03.4077370Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:41:03.4078714Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:41:03.4086842Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:41:04.8016552Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:41:04.8017385Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:41:05.1156855Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:41:05.1166634Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:41:05.1191640Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:41:05.1192353Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:41:05.1202437Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:41:05.1203071Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:41:05.1267112Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:41:05.1268411Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:41:05.1269696Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:41:05.1270956Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:41:05.1282199Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:41:05.1283471Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:41:05.1284727Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:41:05.1285978Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:41:05.1325887Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:41:05.1326588Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:41:05.1871471Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:41:05.1872170Z warnings.warn(msg, FutureWarning) 2022-05-18T04:41:05.1877993Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:41:05.1878680Z warnings.warn(msg, FutureWarning) 2022-05-18T04:41:05.1975828Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:41:05.1976366Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:41:05.5397712Z ok (3.131s) 2022-05-18T04:41:05.5522492Z test_delayed_reduce_scatter_offload_true_none_none (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 34997 2022-05-18T04:41:05.5629051Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 34998 2022-05-18T04:41:06.4570806Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpacw4369_ 2022-05-18T04:41:06.4574207Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpacw4369_/_remote_module_non_scriptable.py 2022-05-18T04:41:06.4631432Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp257r1xg2 2022-05-18T04:41:06.4633334Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp257r1xg2/_remote_module_non_scriptable.py 2022-05-18T04:41:06.4807765Z dist init r=1, world=2 2022-05-18T04:41:06.4813077Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:41:06.4853834Z dist init r=0, world=2 2022-05-18T04:41:06.4858214Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:41:06.4859308Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:41:06.4916700Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:41:07.8922493Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:41:07.8923026Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:41:08.2015144Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:41:08.2025395Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:41:08.2050011Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:41:08.2050788Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:41:08.2061360Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:41:08.2062320Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:41:08.2126479Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:41:08.2127794Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:41:08.2129086Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:41:08.2130617Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:41:08.2141041Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:41:08.2142327Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:41:08.2143614Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:41:08.2144865Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:41:08.2187890Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:41:08.2189749Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:41:09.6138204Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:41:09.6138928Z warnings.warn(msg, FutureWarning) 2022-05-18T04:41:09.6141519Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:41:09.6142199Z warnings.warn(msg, FutureWarning) 2022-05-18T04:41:09.6236730Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:41:09.6238239Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:41:10.9753900Z ok (5.435s) 2022-05-18T04:41:10.9881434Z test_delayed_reduce_scatter_offload_true_none_shard_grad_op (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 35084 2022-05-18T04:41:10.9989046Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 35085 2022-05-18T04:41:11.8546418Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmphvqeln2t 2022-05-18T04:41:11.8547384Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmphvqeln2t/_remote_module_non_scriptable.py 2022-05-18T04:41:11.8768311Z dist init r=1, world=2 2022-05-18T04:41:11.8772555Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:41:11.9007763Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpaufb7v3h 2022-05-18T04:41:11.9010635Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpaufb7v3h/_remote_module_non_scriptable.py 2022-05-18T04:41:11.9237765Z dist init r=0, world=2 2022-05-18T04:41:11.9242483Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:41:11.9243867Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:41:11.9282523Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:41:13.3088200Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:41:13.3088749Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:41:13.6274216Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:41:13.6274797Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:41:13.6307626Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:41:13.6308436Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:41:13.6309288Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:41:13.6309924Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:41:13.6383508Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:41:13.6384808Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:41:13.6386078Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:41:13.6387345Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:41:13.6388987Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:41:13.6390262Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:41:13.6391514Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:41:13.6392787Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:41:13.6435289Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:41:13.6435817Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:41:15.6943750Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:41:15.6944475Z warnings.warn(msg, FutureWarning) 2022-05-18T04:41:15.6947350Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:41:15.6948033Z warnings.warn(msg, FutureWarning) 2022-05-18T04:41:15.7043232Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:41:15.7043743Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:41:18.0154943Z ok (7.040s) 2022-05-18T04:41:18.0280735Z test_delayed_reduce_scatter_offload_true_prefetch_post_no_shard (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 35171 2022-05-18T04:41:18.0383863Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 35172 2022-05-18T04:41:18.8879950Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpo0wv_hay 2022-05-18T04:41:18.8881093Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpo0wv_hay/_remote_module_non_scriptable.py 2022-05-18T04:41:18.9100409Z dist init r=0, world=2 2022-05-18T04:41:18.9104314Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:41:18.9497867Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp3qoxzgyg 2022-05-18T04:41:18.9500616Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp3qoxzgyg/_remote_module_non_scriptable.py 2022-05-18T04:41:18.9729245Z dist init r=1, world=2 2022-05-18T04:41:18.9734044Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:41:18.9735471Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:41:18.9818203Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:41:20.3474791Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:41:20.3475635Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:41:20.6660512Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:41:20.6670364Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:41:20.6693890Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:41:20.6694584Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:41:20.6706239Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:41:20.6706998Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:41:20.6768420Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:41:20.6769735Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:41:20.6771401Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:41:20.6772665Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:41:20.6785940Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:41:20.6787251Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:41:20.6788525Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:41:20.6789793Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:41:20.6827289Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:41:20.6829542Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:41:20.7371356Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:41:20.7372043Z warnings.warn(msg, FutureWarning) 2022-05-18T04:41:20.7379907Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:41:20.7380602Z warnings.warn(msg, FutureWarning) 2022-05-18T04:41:20.7473683Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:41:20.7475328Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:41:21.1466137Z ok (3.131s) 2022-05-18T04:41:21.1596378Z test_delayed_reduce_scatter_offload_true_prefetch_post_none (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 35258 2022-05-18T04:41:21.1706817Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 35259 2022-05-18T04:41:22.0753651Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpd2g4dimf 2022-05-18T04:41:22.0756462Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpd2g4dimf/_remote_module_non_scriptable.py 2022-05-18T04:41:22.0978993Z dist init r=1, world=2 2022-05-18T04:41:22.0983039Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:41:22.1195085Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpqattllhk 2022-05-18T04:41:22.1198121Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpqattllhk/_remote_module_non_scriptable.py 2022-05-18T04:41:22.1425851Z dist init r=0, world=2 2022-05-18T04:41:22.1430463Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:41:22.1431470Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:41:22.1493231Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:41:23.5207266Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:41:23.5207801Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:41:23.8393930Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:41:23.8394503Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:41:23.8427735Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:41:23.8428409Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:41:23.8429272Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:41:23.8430272Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:41:23.8503224Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:41:23.8504549Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:41:23.8505817Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:41:23.8507092Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:41:23.8511610Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:41:23.8512883Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:41:23.8514149Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:41:23.8515403Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:41:23.8559418Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:41:23.8559914Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:41:25.9085828Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:41:25.9087279Z warnings.warn(msg, FutureWarning) 2022-05-18T04:41:25.9090605Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:41:25.9092025Z warnings.warn(msg, FutureWarning) 2022-05-18T04:41:25.9192068Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:41:25.9193328Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:41:28.2878396Z ok (7.141s) 2022-05-18T04:41:28.3010540Z test_delayed_reduce_scatter_offload_true_prefetch_post_shard_grad_op (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 35345 2022-05-18T04:41:28.3121320Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 35346 2022-05-18T04:41:29.2167556Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmps497mexw 2022-05-18T04:41:29.2168995Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmps497mexw/_remote_module_non_scriptable.py 2022-05-18T04:41:29.2400093Z dist init r=1, world=2 2022-05-18T04:41:29.2404209Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:41:29.2491644Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpdtsw243p 2022-05-18T04:41:29.2494834Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpdtsw243p/_remote_module_non_scriptable.py 2022-05-18T04:41:29.2712843Z dist init r=0, world=2 2022-05-18T04:41:29.2717335Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:41:29.2718470Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:41:29.2813310Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:41:30.6497771Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:41:30.6498296Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:41:30.9588823Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:41:30.9598765Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:41:30.9622493Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:41:30.9623178Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:41:30.9635697Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:41:30.9636348Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:41:30.9699985Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:41:30.9701314Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:41:30.9702588Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:41:30.9703863Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:41:30.9716119Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:41:30.9717406Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:41:30.9718675Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:41:30.9719948Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:41:30.9762965Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:41:30.9763746Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:41:33.0289710Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:41:33.0290860Z warnings.warn(msg, FutureWarning) 2022-05-18T04:41:33.0293535Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:41:33.0294202Z warnings.warn(msg, FutureWarning) 2022-05-18T04:41:33.0389547Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:41:33.0393145Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:41:35.4278407Z ok (7.140s) 2022-05-18T04:41:35.4410777Z test_delayed_reduce_scatter_offload_true_prefetch_pre_no_shard (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 35432 2022-05-18T04:41:35.4519247Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 35433 2022-05-18T04:41:36.3436349Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpxi9227d1 2022-05-18T04:41:36.3437334Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpxi9227d1/_remote_module_non_scriptable.py 2022-05-18T04:41:36.3567692Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp9bkijyz4 2022-05-18T04:41:36.3570322Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp9bkijyz4/_remote_module_non_scriptable.py 2022-05-18T04:41:36.3657440Z dist init r=1, world=2 2022-05-18T04:41:36.3661582Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:41:36.3797414Z dist init r=0, world=2 2022-05-18T04:41:36.3801635Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:41:36.3802677Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:41:36.3866825Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:41:37.7815904Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:41:37.7816477Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:41:38.0909960Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:41:38.0910515Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:41:38.0943782Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:41:38.0944460Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:41:38.0945325Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:41:38.0945958Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:41:38.1018084Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:41:38.1019374Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:41:38.1020650Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:41:38.1021914Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:41:38.1023172Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:41:38.1024441Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:41:38.1025697Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:41:38.1027252Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:41:38.1064739Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:41:38.1065373Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:41:38.1609258Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:41:38.1609930Z warnings.warn(msg, FutureWarning) 2022-05-18T04:41:38.1619752Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:41:38.1620430Z warnings.warn(msg, FutureWarning) 2022-05-18T04:41:38.1717469Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:41:38.1717976Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:41:38.5602189Z ok (3.132s) 2022-05-18T04:41:38.5733578Z test_delayed_reduce_scatter_offload_true_prefetch_pre_none (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 35519 2022-05-18T04:41:38.5840613Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 35520 2022-05-18T04:41:39.4787254Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp6_pjkr_m 2022-05-18T04:41:39.4788220Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp6_pjkr_m/_remote_module_non_scriptable.py 2022-05-18T04:41:39.4819674Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp563ppltj 2022-05-18T04:41:39.4822457Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp563ppltj/_remote_module_non_scriptable.py 2022-05-18T04:41:39.5005809Z dist init r=1, world=2 2022-05-18T04:41:39.5010282Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:41:39.5044350Z dist init r=0, world=2 2022-05-18T04:41:39.5048747Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:41:39.5049539Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:41:39.5113945Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:41:40.8452371Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:41:40.8452907Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:41:41.1525821Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:41:41.1526405Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:41:41.1559978Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:41:41.1560725Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:41:41.1561557Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:41:41.1562476Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:41:41.1637243Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:41:41.1638585Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:41:41.1639869Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:41:41.1641139Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:41:41.1642405Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:41:41.1643661Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:41:41.1644917Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:41:41.1646156Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:41:41.1683292Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:41:41.1683803Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:41:43.2202712Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:41:43.2203448Z warnings.warn(msg, FutureWarning) 2022-05-18T04:41:43.2204391Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:41:43.2205041Z warnings.warn(msg, FutureWarning) 2022-05-18T04:41:43.2297637Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:41:43.2298127Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:41:45.5995688Z ok (7.039s) 2022-05-18T04:41:45.6126271Z test_delayed_reduce_scatter_offload_true_prefetch_pre_shard_grad_op (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 35606 2022-05-18T04:41:45.6233097Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 35607 2022-05-18T04:41:46.5113191Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp70fg01w9 2022-05-18T04:41:46.5114386Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp70fg01w9/_remote_module_non_scriptable.py 2022-05-18T04:41:46.5210354Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp9qiwgy01 2022-05-18T04:41:46.5213880Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp9qiwgy01/_remote_module_non_scriptable.py 2022-05-18T04:41:46.5334152Z dist init r=0, world=2 2022-05-18T04:41:46.5338810Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:41:46.5442326Z dist init r=1, world=2 2022-05-18T04:41:46.5446557Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:41:46.5447745Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:41:46.5543987Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:41:47.9111824Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:41:47.9112369Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:41:48.2159150Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:41:48.2160481Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:41:48.2193107Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:41:48.2193788Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:41:48.2195312Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:41:48.2195948Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:41:48.2270866Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:41:48.2272198Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:41:48.2273474Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:41:48.2274974Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:41:48.2276651Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:41:48.2277936Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:41:48.2279212Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:41:48.2280485Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:41:48.2320313Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:41:48.2321202Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:41:50.2844663Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:41:50.2845373Z warnings.warn(msg, FutureWarning) 2022-05-18T04:41:50.2847739Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:41:50.2848430Z warnings.warn(msg, FutureWarning) 2022-05-18T04:41:50.2942338Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:41:50.2943318Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:41:52.6386635Z ok (7.039s) 2022-05-18T04:41:52.6512878Z test_mixture_of_experts_offload_false_none_no_shard_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 35693 2022-05-18T04:41:52.6619841Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 35694 2022-05-18T04:41:53.5602109Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmppy2_oj4e 2022-05-18T04:41:53.5603256Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmppy2_oj4e/_remote_module_non_scriptable.py 2022-05-18T04:41:53.5660708Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpt7ri4otn 2022-05-18T04:41:53.5663676Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpt7ri4otn/_remote_module_non_scriptable.py 2022-05-18T04:41:53.5833978Z dist init r=0, world=2 2022-05-18T04:41:53.5838345Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:41:53.5884092Z dist init r=1, world=2 2022-05-18T04:41:53.5888288Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:41:53.5889392Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:41:53.5942023Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:41:54.9531022Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:41:54.9531773Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:41:55.2634839Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:41:55.2635579Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:41:55.2703567Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:41:55.2704441Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:41:55.2729322Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:41:55.2750475Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:41:55.2751498Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:41:55.2832724Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:41:55.3421516Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:41:55.3422250Z warnings.warn(msg, FutureWarning) 2022-05-18T04:41:55.3428983Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:41:55.3429661Z warnings.warn(msg, FutureWarning) 2022-05-18T04:41:55.3565937Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:41:55.3578379Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:41:55.3579076Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:41:55.3668890Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:41:55.7700462Z ok (3.131s) 2022-05-18T04:41:55.7827118Z test_mixture_of_experts_offload_false_none_no_shard_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 35800 2022-05-18T04:41:55.7931926Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 35801 2022-05-18T04:41:56.7015834Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp6hw8x_4r 2022-05-18T04:41:56.7016934Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp6hw8x_4r/_remote_module_non_scriptable.py 2022-05-18T04:41:56.7234799Z dist init r=0, world=2 2022-05-18T04:41:56.7239028Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:41:56.7462613Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmphc73tj2t 2022-05-18T04:41:56.7465218Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmphc73tj2t/_remote_module_non_scriptable.py 2022-05-18T04:41:56.7691104Z dist init r=1, world=2 2022-05-18T04:41:56.7695969Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:41:56.7697239Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:41:56.7749375Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:41:58.1442999Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:41:58.1443542Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:41:58.4526587Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:41:58.4527304Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:41:58.4568421Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:41:58.4569081Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:41:58.4595384Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:41:58.4615343Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:41:58.4616056Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:41:58.4698477Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:41:58.5289774Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:41:58.5290820Z warnings.warn(msg, FutureWarning) 2022-05-18T04:41:58.5297248Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:41:58.5297940Z warnings.warn(msg, FutureWarning) 2022-05-18T04:41:58.5433605Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:41:58.5445082Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:41:58.5446216Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:41:58.5536659Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:41:58.9015150Z ok (3.131s) 2022-05-18T04:41:58.9144120Z test_mixture_of_experts_offload_false_none_none_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 35907 2022-05-18T04:41:58.9247197Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 35908 2022-05-18T04:41:59.8334494Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp2tqkgbf6 2022-05-18T04:41:59.8335736Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp2tqkgbf6/_remote_module_non_scriptable.py 2022-05-18T04:41:59.8369592Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpioq26lvp 2022-05-18T04:41:59.8372492Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpioq26lvp/_remote_module_non_scriptable.py 2022-05-18T04:41:59.8556201Z dist init r=0, world=2 2022-05-18T04:41:59.8560683Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:41:59.8599300Z dist init r=1, world=2 2022-05-18T04:41:59.8603356Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:41:59.8604602Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:41:59.8664231Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:42:01.2238756Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:42:01.2239300Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:42:01.5362282Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:42:01.5362978Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:42:01.5371332Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:42:01.5371987Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:42:01.5406454Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:42:01.5415980Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:42:01.5417012Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:42:01.5509672Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:42:01.5866532Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:42:01.5867238Z warnings.warn(msg, FutureWarning) 2022-05-18T04:42:01.5868171Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:42:01.5868832Z warnings.warn(msg, FutureWarning) 2022-05-18T04:42:01.6006857Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:42:01.6010408Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:42:01.6011609Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:42:01.6110015Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:42:01.9326334Z ok (3.031s) 2022-05-18T04:42:01.9457724Z test_mixture_of_experts_offload_false_none_none_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 35998 2022-05-18T04:42:01.9563977Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 35999 2022-05-18T04:42:02.8521510Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp2waq5s70 2022-05-18T04:42:02.8523093Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp2waq5s70/_remote_module_non_scriptable.py 2022-05-18T04:42:02.8752031Z dist init r=1, world=2 2022-05-18T04:42:02.8756338Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:42:02.8900280Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpfo2d30_y 2022-05-18T04:42:02.8903077Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpfo2d30_y/_remote_module_non_scriptable.py 2022-05-18T04:42:02.9121169Z dist init r=0, world=2 2022-05-18T04:42:02.9125474Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:42:02.9126662Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:42:02.9164787Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:42:04.2951327Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:42:04.2951883Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:42:04.6043390Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:42:04.6044137Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:42:04.6050762Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:42:04.6051616Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:42:04.6086773Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:42:04.6097467Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:42:04.6098487Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:42:04.6189891Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:42:04.6551695Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:42:04.6552655Z warnings.warn(msg, FutureWarning) 2022-05-18T04:42:04.6554387Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:42:04.6555062Z warnings.warn(msg, FutureWarning) 2022-05-18T04:42:04.6691521Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:42:04.6703471Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:42:04.6704435Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:42:04.6794509Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:42:05.0657976Z ok (3.133s) 2022-05-18T04:42:05.0787528Z test_mixture_of_experts_offload_false_none_shard_grad_op_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 36089 2022-05-18T04:42:05.0895334Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 36090 2022-05-18T04:42:05.9821711Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp5zpdvi1a 2022-05-18T04:42:05.9822552Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp5zpdvi1a/_remote_module_non_scriptable.py 2022-05-18T04:42:06.0042276Z dist init r=1, world=2 2022-05-18T04:42:06.0046627Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:42:06.0225657Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpdwmx75pc 2022-05-18T04:42:06.0228649Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpdwmx75pc/_remote_module_non_scriptable.py 2022-05-18T04:42:06.0448025Z dist init r=0, world=2 2022-05-18T04:42:06.0452437Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:42:06.0453880Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:42:06.0454594Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:42:07.4068266Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:42:07.7164479Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:42:07.7166086Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:42:07.7166767Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:42:07.7167621Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:42:07.7168260Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:42:07.7208124Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:42:07.7209241Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:42:07.7210548Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:42:07.7211278Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:42:07.7559044Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:42:07.7560504Z warnings.warn(msg, FutureWarning) 2022-05-18T04:42:07.7561564Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:42:07.7562486Z warnings.warn(msg, FutureWarning) 2022-05-18T04:42:07.7696477Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:42:07.7700168Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:42:07.7701719Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:42:07.7799727Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:42:08.0973993Z ok (3.031s) 2022-05-18T04:42:08.1101012Z test_mixture_of_experts_offload_false_none_shard_grad_op_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 36180 2022-05-18T04:42:08.1204672Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 36181 2022-05-18T04:42:09.0201701Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp0jt7k0nv 2022-05-18T04:42:09.0203195Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp0jt7k0nv/_remote_module_non_scriptable.py 2022-05-18T04:42:09.0204773Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpocg4gq75 2022-05-18T04:42:09.0207799Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpocg4gq75/_remote_module_non_scriptable.py 2022-05-18T04:42:09.0424511Z dist init r=0, world=2 2022-05-18T04:42:09.0429168Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:42:09.0434475Z dist init r=1, world=2 2022-05-18T04:42:09.0438893Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:42:09.0440487Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:42:09.0533131Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:42:10.4295601Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:42:10.4296334Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:42:10.7362973Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:42:10.7363704Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:42:10.7398101Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:42:10.7398777Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:42:10.7425055Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:42:10.7444053Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:42:10.7445439Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:42:10.7529048Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:42:10.7885304Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:42:10.7886221Z warnings.warn(msg, FutureWarning) 2022-05-18T04:42:10.7888135Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:42:10.7888928Z warnings.warn(msg, FutureWarning) 2022-05-18T04:42:10.8025004Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:42:10.8034613Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:42:10.8035963Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:42:10.8127995Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:42:11.1283413Z ok (3.031s) 2022-05-18T04:42:11.1408849Z test_mixture_of_experts_offload_false_prefetch_post_no_shard_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 36271 2022-05-18T04:42:11.1512771Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 36272 2022-05-18T04:42:12.0527110Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpdlalt665 2022-05-18T04:42:12.0528413Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpdlalt665/_remote_module_non_scriptable.py 2022-05-18T04:42:12.0565625Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpyzsgyj5h 2022-05-18T04:42:12.0568278Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpyzsgyj5h/_remote_module_non_scriptable.py 2022-05-18T04:42:12.0756654Z dist init r=1, world=2 2022-05-18T04:42:12.0761069Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:42:12.0792968Z dist init r=0, world=2 2022-05-18T04:42:12.0797406Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:42:12.0798672Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:42:12.0865140Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:42:13.4630282Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:42:13.4630842Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:42:13.7708004Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:42:13.7708715Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:42:13.7802109Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:42:13.7802771Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:42:13.7831089Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:42:13.7848657Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:42:13.7849482Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:42:13.7934248Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:42:13.8533664Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:42:13.8534737Z warnings.warn(msg, FutureWarning) 2022-05-18T04:42:13.8536169Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:42:13.8536833Z warnings.warn(msg, FutureWarning) 2022-05-18T04:42:13.8680890Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:42:13.8684795Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:42:13.8685958Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:42:13.8783913Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:42:14.2595851Z ok (3.131s) 2022-05-18T04:42:14.2725483Z test_mixture_of_experts_offload_false_prefetch_post_no_shard_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 36378 2022-05-18T04:42:14.2830600Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 36379 2022-05-18T04:42:15.1781107Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpfekzqu9n 2022-05-18T04:42:15.1782257Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpfekzqu9n/_remote_module_non_scriptable.py 2022-05-18T04:42:15.1929608Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmplg39x5w2 2022-05-18T04:42:15.1932737Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmplg39x5w2/_remote_module_non_scriptable.py 2022-05-18T04:42:15.2000602Z dist init r=0, world=2 2022-05-18T04:42:15.2005109Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:42:15.2162444Z dist init r=1, world=2 2022-05-18T04:42:15.2167174Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:42:15.2168305Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:42:15.2211055Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:42:16.5950605Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:42:16.5951365Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:42:16.9032279Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:42:16.9032994Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:42:16.9073060Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:42:16.9073712Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:42:16.9099493Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:42:16.9120539Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:42:16.9121543Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:42:16.9202619Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:42:16.9797990Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:42:16.9798738Z warnings.warn(msg, FutureWarning) 2022-05-18T04:42:16.9806926Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:42:16.9807597Z warnings.warn(msg, FutureWarning) 2022-05-18T04:42:16.9947613Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:42:16.9959719Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:42:16.9960732Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:42:17.0050654Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:42:17.3912788Z ok (3.132s) 2022-05-18T04:42:17.4044298Z test_mixture_of_experts_offload_false_prefetch_post_none_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 36485 2022-05-18T04:42:17.4153319Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 36486 2022-05-18T04:42:18.3454128Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpl3d3_hy4 2022-05-18T04:42:18.3456374Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpl3d3_hy4/_remote_module_non_scriptable.py 2022-05-18T04:42:18.3524155Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpezw5a7qq 2022-05-18T04:42:18.3526797Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpezw5a7qq/_remote_module_non_scriptable.py 2022-05-18T04:42:18.3676664Z dist init r=0, world=2 2022-05-18T04:42:18.3680839Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:42:18.3755007Z dist init r=1, world=2 2022-05-18T04:42:18.3759304Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:42:18.3760830Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:42:18.3784152Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:42:19.7736647Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:42:19.7737247Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:42:20.0827368Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:42:20.0828082Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:42:20.0868138Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:42:20.0894683Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:42:20.0895373Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:42:20.0914772Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:42:20.0915715Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:42:20.0997836Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:42:20.1367059Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:42:20.1367771Z warnings.warn(msg, FutureWarning) 2022-05-18T04:42:20.1370443Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:42:20.1371309Z warnings.warn(msg, FutureWarning) 2022-05-18T04:42:20.1508445Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:42:20.1517911Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:42:20.1518843Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:42:20.1611208Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:42:20.5235804Z ok (3.132s) 2022-05-18T04:42:20.5367691Z test_mixture_of_experts_offload_false_prefetch_post_none_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 36576 2022-05-18T04:42:20.5475921Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 36577 2022-05-18T04:42:21.4491402Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpwmbdd4zr 2022-05-18T04:42:21.4492546Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpwmbdd4zr/_remote_module_non_scriptable.py 2022-05-18T04:42:21.4720380Z dist init r=0, world=2 2022-05-18T04:42:21.4724503Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:42:21.4766033Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp9yx354_4 2022-05-18T04:42:21.4768390Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp9yx354_4/_remote_module_non_scriptable.py 2022-05-18T04:42:21.4986364Z dist init r=1, world=2 2022-05-18T04:42:21.4990590Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:42:21.4991421Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:42:21.5031820Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:42:22.8966579Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:42:22.8967120Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:42:23.2026783Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:42:23.2027844Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:42:23.2177838Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:42:23.2178528Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:42:23.2203055Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:42:23.2223660Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:42:23.2224392Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:42:23.2306061Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:42:23.2672475Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:42:23.2673165Z warnings.warn(msg, FutureWarning) 2022-05-18T04:42:23.2675172Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:42:23.2675829Z warnings.warn(msg, FutureWarning) 2022-05-18T04:42:23.2812573Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:42:23.2821318Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:42:23.2822281Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:42:23.2915731Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:42:23.6557663Z ok (3.132s) 2022-05-18T04:42:23.6688195Z test_mixture_of_experts_offload_false_prefetch_post_shard_grad_op_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 36667 2022-05-18T04:42:23.6795551Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 36668 2022-05-18T04:42:24.5920983Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp3vl9acqn 2022-05-18T04:42:24.5922342Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp3vl9acqn/_remote_module_non_scriptable.py 2022-05-18T04:42:24.6149893Z dist init r=1, world=2 2022-05-18T04:42:24.6153887Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:42:24.6166158Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpq41ffqrd 2022-05-18T04:42:24.6168858Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpq41ffqrd/_remote_module_non_scriptable.py 2022-05-18T04:42:24.6387300Z dist init r=0, world=2 2022-05-18T04:42:24.6391245Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:42:24.6392168Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:42:24.6460924Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:42:26.0225036Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:42:26.0225614Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:42:26.3380517Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:42:26.3381250Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:42:26.3383414Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:42:26.3384218Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:42:26.3423216Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:42:26.3429852Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:42:26.3430775Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:42:26.3526302Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:42:26.3886796Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:42:26.3887472Z warnings.warn(msg, FutureWarning) 2022-05-18T04:42:26.3889939Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:42:26.3890828Z warnings.warn(msg, FutureWarning) 2022-05-18T04:42:26.4027672Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:42:26.4037065Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:42:26.4038072Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:42:26.4130671Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:42:26.7878007Z ok (3.132s) 2022-05-18T04:42:26.8009962Z test_mixture_of_experts_offload_false_prefetch_post_shard_grad_op_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 36758 2022-05-18T04:42:26.8118695Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 36759 2022-05-18T04:42:27.7221502Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpyvefw0py 2022-05-18T04:42:27.7222505Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpyvefw0py/_remote_module_non_scriptable.py 2022-05-18T04:42:27.7247182Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpl8kiay7y 2022-05-18T04:42:27.7249758Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpl8kiay7y/_remote_module_non_scriptable.py 2022-05-18T04:42:27.7442034Z dist init r=0, world=2 2022-05-18T04:42:27.7446112Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:42:27.7478504Z dist init r=1, world=2 2022-05-18T04:42:27.7482910Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:42:27.7484103Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:42:27.7549459Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:42:29.1287716Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:42:29.1288290Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:42:29.4377238Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:42:29.4377938Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:42:29.4497265Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:42:29.4497920Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:42:29.4523239Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:42:29.4543823Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:42:29.4544598Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:42:29.4626166Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:42:29.4986333Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:42:29.4987023Z warnings.warn(msg, FutureWarning) 2022-05-18T04:42:29.4988141Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:42:29.4988813Z warnings.warn(msg, FutureWarning) 2022-05-18T04:42:29.5124483Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:42:29.5135946Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:42:29.5136779Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:42:29.5227565Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:42:29.8198120Z ok (3.032s) 2022-05-18T04:42:29.8327886Z test_mixture_of_experts_offload_false_prefetch_pre_no_shard_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 36849 2022-05-18T04:42:29.8435194Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 36850 2022-05-18T04:42:30.7361504Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmppay53c_c 2022-05-18T04:42:30.7362318Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmppay53c_c/_remote_module_non_scriptable.py 2022-05-18T04:42:30.7419860Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpwoiacido 2022-05-18T04:42:30.7422814Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpwoiacido/_remote_module_non_scriptable.py 2022-05-18T04:42:30.7584039Z dist init r=0, world=2 2022-05-18T04:42:30.7588392Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:42:30.7653356Z dist init r=1, world=2 2022-05-18T04:42:30.7657929Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:42:30.7659434Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:42:30.7691617Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:42:32.1593242Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:42:32.1593779Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:42:32.4692754Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:42:32.4693525Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:42:32.4740937Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:42:32.4741589Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:42:32.4766978Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:42:32.4788039Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:42:32.4788958Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:42:32.4869965Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:42:32.5469673Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:42:32.5470407Z warnings.warn(msg, FutureWarning) 2022-05-18T04:42:32.5474796Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:42:32.5475445Z warnings.warn(msg, FutureWarning) 2022-05-18T04:42:32.5614160Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:42:32.5625702Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:42:32.5626678Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:42:32.5717098Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:42:32.9517774Z ok (3.132s) 2022-05-18T04:42:32.9644852Z test_mixture_of_experts_offload_false_prefetch_pre_no_shard_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 36956 2022-05-18T04:42:32.9749346Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 36957 2022-05-18T04:42:33.8778783Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp6pvgx3rb 2022-05-18T04:42:33.8779898Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp6pvgx3rb/_remote_module_non_scriptable.py 2022-05-18T04:42:33.8922125Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpscca2tcw 2022-05-18T04:42:33.8925276Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpscca2tcw/_remote_module_non_scriptable.py 2022-05-18T04:42:33.8998265Z dist init r=1, world=2 2022-05-18T04:42:33.9002492Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:42:33.9155645Z dist init r=0, world=2 2022-05-18T04:42:33.9160329Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:42:33.9161511Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:42:33.9207728Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:42:35.2978381Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:42:35.2978899Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:42:35.6127624Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:42:35.6128352Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:42:35.6167241Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:42:35.6167929Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:42:35.6193579Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:42:35.6214646Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:42:35.6215772Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:42:35.6296658Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:42:35.6893889Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:42:35.6894672Z warnings.warn(msg, FutureWarning) 2022-05-18T04:42:35.6900648Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:42:35.6901325Z warnings.warn(msg, FutureWarning) 2022-05-18T04:42:35.7038253Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:42:35.7049632Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:42:35.7050725Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:42:35.7140946Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:42:36.0832603Z ok (3.131s) 2022-05-18T04:42:36.0958226Z test_mixture_of_experts_offload_false_prefetch_pre_none_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 37063 2022-05-18T04:42:36.1062436Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 37064 2022-05-18T04:42:37.0108028Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpxm11wl5h 2022-05-18T04:42:37.0109665Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpxm11wl5h/_remote_module_non_scriptable.py 2022-05-18T04:42:37.0340907Z dist init r=1, world=2 2022-05-18T04:42:37.0345792Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:42:37.0361347Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpy5u0sxei 2022-05-18T04:42:37.0364178Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpy5u0sxei/_remote_module_non_scriptable.py 2022-05-18T04:42:37.0584813Z dist init r=0, world=2 2022-05-18T04:42:37.0588972Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:42:37.0590078Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:42:37.0652887Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:42:38.4433384Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:42:38.4433929Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:42:38.7525047Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:42:38.7525741Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:42:38.7534262Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:42:38.7534906Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:42:38.7568300Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:42:38.7582057Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:42:38.7583269Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:42:38.7671419Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:42:38.8040948Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:42:38.8041644Z warnings.warn(msg, FutureWarning) 2022-05-18T04:42:38.8044486Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:42:38.8045162Z warnings.warn(msg, FutureWarning) 2022-05-18T04:42:38.8184595Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:42:38.8196841Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:42:38.8198043Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:42:38.8287351Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:42:39.2144091Z ok (3.131s) 2022-05-18T04:42:39.2272530Z test_mixture_of_experts_offload_false_prefetch_pre_none_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 37154 2022-05-18T04:42:39.2377429Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 37155 2022-05-18T04:42:40.1402436Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpubzc98vm 2022-05-18T04:42:40.1403271Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpubzc98vm/_remote_module_non_scriptable.py 2022-05-18T04:42:40.1515912Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpxxxlqgi9 2022-05-18T04:42:40.1518602Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpxxxlqgi9/_remote_module_non_scriptable.py 2022-05-18T04:42:40.1632588Z dist init r=1, world=2 2022-05-18T04:42:40.1637270Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:42:40.1743524Z dist init r=0, world=2 2022-05-18T04:42:40.1747790Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:42:40.1748842Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:42:40.1842634Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:42:41.5665438Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:42:41.5666027Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:42:41.8792162Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:42:41.8792878Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:42:41.8836628Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:42:41.8837274Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:42:41.8862748Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:42:41.8882468Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:42:41.8883473Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:42:41.8965913Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:42:41.9326895Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:42:41.9327621Z warnings.warn(msg, FutureWarning) 2022-05-18T04:42:41.9329268Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:42:41.9330525Z warnings.warn(msg, FutureWarning) 2022-05-18T04:42:41.9464998Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:42:41.9476000Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:42:41.9477127Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:42:41.9567731Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:42:42.3461152Z ok (3.132s) 2022-05-18T04:42:42.3588569Z test_mixture_of_experts_offload_false_prefetch_pre_shard_grad_op_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 37245 2022-05-18T04:42:42.3695798Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 37246 2022-05-18T04:42:43.2688895Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpo4kjd1rt 2022-05-18T04:42:43.2689991Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpo4kjd1rt/_remote_module_non_scriptable.py 2022-05-18T04:42:43.2920670Z dist init r=1, world=2 2022-05-18T04:42:43.2925209Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:42:43.2960716Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmphwhmdp0a 2022-05-18T04:42:43.2963647Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmphwhmdp0a/_remote_module_non_scriptable.py 2022-05-18T04:42:43.3184597Z dist init r=0, world=2 2022-05-18T04:42:43.3189113Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:42:43.3190245Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:42:43.3233000Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:42:44.7184986Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:42:44.7185985Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:42:45.0316960Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:42:45.0318351Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:42:45.0326027Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:42:45.0327360Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:42:45.0359251Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:42:45.0375081Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:42:45.0376449Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:42:45.0462927Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:42:45.0821944Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:42:45.0823715Z warnings.warn(msg, FutureWarning) 2022-05-18T04:42:45.0825680Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:42:45.0827217Z warnings.warn(msg, FutureWarning) 2022-05-18T04:42:45.0961385Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:42:45.0973148Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:42:45.0974610Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:42:45.1064648Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:42:45.4777238Z ok (3.131s) 2022-05-18T04:42:45.4904681Z test_mixture_of_experts_offload_false_prefetch_pre_shard_grad_op_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 37336 2022-05-18T04:42:45.5009576Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 37337 2022-05-18T04:42:46.4099811Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpc7olnpal 2022-05-18T04:42:46.4100903Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpc7olnpal/_remote_module_non_scriptable.py 2022-05-18T04:42:46.4201568Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmphe23a_ml 2022-05-18T04:42:46.4204554Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmphe23a_ml/_remote_module_non_scriptable.py 2022-05-18T04:42:46.4322458Z dist init r=1, world=2 2022-05-18T04:42:46.4326893Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:42:46.4435974Z dist init r=0, world=2 2022-05-18T04:42:46.4440441Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:42:46.4441623Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:42:46.4532189Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:42:47.8390601Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:42:47.8391594Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:42:48.1458260Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:42:48.1459585Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:42:48.1535614Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:42:48.1536938Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:42:48.1561864Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:42:48.1583844Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:42:48.1585169Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:42:48.1665450Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:42:48.2020518Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:42:48.2021904Z warnings.warn(msg, FutureWarning) 2022-05-18T04:42:48.2024050Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:42:48.2025437Z warnings.warn(msg, FutureWarning) 2022-05-18T04:42:48.2159694Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:42:48.2171689Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:42:48.2173158Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:42:48.2263112Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:42:48.6098598Z ok (3.132s) 2022-05-18T04:42:48.6230617Z test_mixture_of_experts_offload_true_none_no_shard_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 37427 2022-05-18T04:42:48.6340770Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 37428 2022-05-18T04:42:49.5392854Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpym8ve605 2022-05-18T04:42:49.5393680Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpym8ve605/_remote_module_non_scriptable.py 2022-05-18T04:42:49.5409631Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpch55sj56 2022-05-18T04:42:49.5412552Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpch55sj56/_remote_module_non_scriptable.py 2022-05-18T04:42:49.5613277Z dist init r=1, world=2 2022-05-18T04:42:49.5617514Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:42:49.5633632Z dist init r=0, world=2 2022-05-18T04:42:49.5637701Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:42:49.5638545Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:42:49.5720780Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:42:50.9093628Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:42:50.9094202Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:42:51.2159143Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:42:51.2159872Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:42:51.2227512Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:42:51.2228157Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:42:51.2253754Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:42:51.2271557Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:42:51.2272497Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:42:51.2356973Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:42:51.2497195Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:42:51.2504223Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:42:51.2504926Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:42:51.2515866Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:42:51.2599864Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:42:51.2610498Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:42:51.3365747Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:42:51.3366501Z warnings.warn(msg, FutureWarning) 2022-05-18T04:42:51.3367462Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:42:51.3368118Z warnings.warn(msg, FutureWarning) 2022-05-18T04:42:51.3500435Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 1 2022-05-18T04:42:51.3503940Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 0 2022-05-18T04:42:51.3504662Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:42:51.3549844Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:42:51.3551154Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:42:51.3552438Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:42:51.3603205Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:42:51.3647956Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:42:51.3649356Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:42:51.3650889Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:42:51.7433947Z ok (3.133s) 2022-05-18T04:42:51.7561426Z test_mixture_of_experts_offload_true_none_no_shard_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 37536 2022-05-18T04:42:51.7666649Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 37537 2022-05-18T04:42:52.6926611Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpy9mt191l 2022-05-18T04:42:52.6927703Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpy9mt191l/_remote_module_non_scriptable.py 2022-05-18T04:42:52.7043600Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpyvq3zzsr 2022-05-18T04:42:52.7046137Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpyvq3zzsr/_remote_module_non_scriptable.py 2022-05-18T04:42:52.7157359Z dist init r=1, world=2 2022-05-18T04:42:52.7161985Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:42:52.7263407Z dist init r=0, world=2 2022-05-18T04:42:52.7267660Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:42:52.7268459Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:42:52.7367698Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:42:54.1074476Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:42:54.1075035Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:42:54.4147579Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:42:54.4148296Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:42:54.4149147Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:42:54.4149787Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:42:54.4191134Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:42:54.4192254Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:42:54.4192804Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:42:54.4193943Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:42:54.4343026Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:42:54.4350752Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:42:54.4351845Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:42:54.4363592Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:42:54.4446051Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:42:54.4457245Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:42:54.5220203Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:42:54.5220953Z warnings.warn(msg, FutureWarning) 2022-05-18T04:42:54.5224109Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:42:54.5224808Z warnings.warn(msg, FutureWarning) 2022-05-18T04:42:54.5358385Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 0 2022-05-18T04:42:54.5368423Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 1 2022-05-18T04:42:54.5369455Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:42:54.5417383Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:42:54.5418670Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:42:54.5419958Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:42:54.5461292Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:42:54.5506883Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:42:54.5508393Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:42:54.5509765Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:42:54.9751605Z ok (3.232s) 2022-05-18T04:42:54.9879355Z test_mixture_of_experts_offload_true_none_none_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 37645 2022-05-18T04:42:54.9986336Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 37646 2022-05-18T04:42:55.8982017Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpiv6at2ip 2022-05-18T04:42:55.8983023Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpiv6at2ip/_remote_module_non_scriptable.py 2022-05-18T04:42:55.9013909Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpji26epu6 2022-05-18T04:42:55.9016928Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpji26epu6/_remote_module_non_scriptable.py 2022-05-18T04:42:55.9203359Z dist init r=1, world=2 2022-05-18T04:42:55.9207795Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:42:55.9242744Z dist init r=0, world=2 2022-05-18T04:42:55.9247223Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:42:55.9248496Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:42:55.9311225Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:42:57.3077562Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:42:57.3078131Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:42:57.6158254Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:42:57.6159055Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:42:57.6234088Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:42:57.6234741Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:42:57.6261257Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:42:57.6280607Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:42:57.6281627Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:42:57.6364435Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:42:57.6512329Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:42:57.6518762Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:42:57.6519461Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:42:57.6530271Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:42:57.6614481Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:42:57.6625749Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:42:57.7106765Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:42:57.7107438Z warnings.warn(msg, FutureWarning) 2022-05-18T04:42:57.7110498Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:42:57.7111162Z warnings.warn(msg, FutureWarning) 2022-05-18T04:42:57.7243761Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 1 2022-05-18T04:42:57.7253586Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 0 2022-05-18T04:42:57.7254745Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:42:57.7305119Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:42:57.7306441Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:42:57.7307717Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:42:57.7346698Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:42:57.7395283Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:42:57.7396564Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:42:57.7398142Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:42:58.1068665Z ok (3.132s) 2022-05-18T04:42:58.1196684Z test_mixture_of_experts_offload_true_none_none_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 37738 2022-05-18T04:42:58.1301615Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 37739 2022-05-18T04:42:59.0301630Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp3j422d37 2022-05-18T04:42:59.0302977Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp3j422d37/_remote_module_non_scriptable.py 2022-05-18T04:42:59.0533888Z dist init r=0, world=2 2022-05-18T04:42:59.0538271Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:42:59.0643317Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp4iv3252l 2022-05-18T04:42:59.0645962Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp4iv3252l/_remote_module_non_scriptable.py 2022-05-18T04:42:59.0862654Z dist init r=1, world=2 2022-05-18T04:42:59.0866876Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:42:59.0868000Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:42:59.0947236Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:43:00.4829684Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:43:00.4830224Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:43:00.7893831Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:43:00.7894512Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:43:00.7907333Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:43:00.7907960Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:43:00.7937504Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:43:00.7951729Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:43:00.7952519Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:43:00.8040595Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:43:00.8186614Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:43:00.8194156Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:43:00.8194837Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:43:00.8205709Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:00.8289546Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:43:00.8300881Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:00.8771900Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:43:00.8772587Z warnings.warn(msg, FutureWarning) 2022-05-18T04:43:00.8774912Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:43:00.8775572Z warnings.warn(msg, FutureWarning) 2022-05-18T04:43:00.8907916Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 1 2022-05-18T04:43:00.8908798Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 0 2022-05-18T04:43:00.8909485Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:43:00.8910513Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:43:00.8958695Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:00.8960337Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:00.8962308Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:00.8963595Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:00.8964872Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:00.8966151Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:01.2381656Z ok (3.131s) 2022-05-18T04:43:01.2507849Z test_mixture_of_experts_offload_true_none_shard_grad_op_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 37831 2022-05-18T04:43:01.2612034Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 37832 2022-05-18T04:43:02.1593544Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp5mmjerjl 2022-05-18T04:43:02.1594730Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp5mmjerjl/_remote_module_non_scriptable.py 2022-05-18T04:43:02.1824445Z dist init r=1, world=2 2022-05-18T04:43:02.1829014Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:43:02.2002668Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp0m945lr_ 2022-05-18T04:43:02.2005203Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp0m945lr_/_remote_module_non_scriptable.py 2022-05-18T04:43:02.2228583Z dist init r=0, world=2 2022-05-18T04:43:02.2233630Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:43:02.2235028Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:43:02.2237308Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:43:03.5974120Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:43:03.5974676Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:43:03.9091326Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:43:03.9092044Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:43:03.9124342Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:43:03.9125008Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:43:03.9152228Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:43:03.9171719Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:43:03.9172614Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:43:03.9255665Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:43:03.9404299Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:43:03.9411208Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:43:03.9411887Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:43:03.9423481Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:03.9507314Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:43:03.9518774Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:03.9999921Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:43:04.0000625Z warnings.warn(msg, FutureWarning) 2022-05-18T04:43:04.0002375Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:43:04.0003042Z warnings.warn(msg, FutureWarning) 2022-05-18T04:43:04.0141334Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 0 2022-05-18T04:43:04.0149625Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 1 2022-05-18T04:43:04.0150630Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:43:04.0201922Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:04.0203337Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:04.0204616Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:04.0244498Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:43:04.0293815Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:04.0295112Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:04.0296391Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:04.3694295Z ok (3.131s) 2022-05-18T04:43:04.3821319Z test_mixture_of_experts_offload_true_none_shard_grad_op_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 37924 2022-05-18T04:43:04.3924401Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 37925 2022-05-18T04:43:05.2887752Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpe9l5qshy 2022-05-18T04:43:05.2888533Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpe9l5qshy/_remote_module_non_scriptable.py 2022-05-18T04:43:05.2920966Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpidvlctsw 2022-05-18T04:43:05.2924462Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpidvlctsw/_remote_module_non_scriptable.py 2022-05-18T04:43:05.3107467Z dist init r=1, world=2 2022-05-18T04:43:05.3111524Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:43:05.3154252Z dist init r=0, world=2 2022-05-18T04:43:05.3158769Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:43:05.3160178Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:43:05.3215248Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:43:06.7001379Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:43:06.7001973Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:43:07.0079118Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:43:07.0079842Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:43:07.0080697Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:43:07.0081348Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:43:07.0122087Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:43:07.0125656Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:43:07.0126864Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:43:07.0225032Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:43:07.0374912Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:43:07.0380255Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:43:07.0380972Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:43:07.0392239Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:07.0477725Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:43:07.0489506Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:07.0971696Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:43:07.0972675Z warnings.warn(msg, FutureWarning) 2022-05-18T04:43:07.0976172Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:43:07.0976873Z warnings.warn(msg, FutureWarning) 2022-05-18T04:43:07.1109643Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 1 2022-05-18T04:43:07.1120577Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 0 2022-05-18T04:43:07.1121655Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:43:07.1173090Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:07.1174386Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:07.1175667Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:07.1212392Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:43:07.1261383Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:07.1262658Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:07.1263925Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:07.5008648Z ok (3.131s) 2022-05-18T04:43:07.5139080Z test_mixture_of_experts_offload_true_prefetch_post_no_shard_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 38017 2022-05-18T04:43:07.5247890Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 38018 2022-05-18T04:43:08.4195295Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpr7f7cnuv 2022-05-18T04:43:08.4196547Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpr7f7cnuv/_remote_module_non_scriptable.py 2022-05-18T04:43:08.4415640Z dist init r=0, world=2 2022-05-18T04:43:08.4419606Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:43:08.4631798Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp46ia4zro 2022-05-18T04:43:08.4634805Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp46ia4zro/_remote_module_non_scriptable.py 2022-05-18T04:43:08.4853764Z dist init r=1, world=2 2022-05-18T04:43:08.4858287Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:43:08.4859258Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:43:08.4929780Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:43:09.8402924Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:43:09.8403454Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:43:10.1451625Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:43:10.1452365Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:43:10.1453279Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:43:10.1453915Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:43:10.1494537Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:43:10.1495226Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:43:10.1495951Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:43:10.1496643Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:43:10.1643021Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:43:10.1645440Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:43:10.1646117Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:43:10.1657461Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:10.1745918Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:43:10.1756733Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:10.2501298Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:43:10.2502043Z warnings.warn(msg, FutureWarning) 2022-05-18T04:43:10.2503713Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:43:10.2504697Z warnings.warn(msg, FutureWarning) 2022-05-18T04:43:10.2636209Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 0 2022-05-18T04:43:10.2641800Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 1 2022-05-18T04:43:10.2642760Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:43:10.2688202Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:10.2689483Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:10.2691601Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:10.2738719Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:43:10.2783879Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:10.2785171Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:10.2786424Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:10.6329829Z ok (3.132s) 2022-05-18T04:43:10.6455594Z test_mixture_of_experts_offload_true_prefetch_post_no_shard_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 38126 2022-05-18T04:43:10.6559519Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 38127 2022-05-18T04:43:11.5466061Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpo829bjnw 2022-05-18T04:43:11.5466916Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpo829bjnw/_remote_module_non_scriptable.py 2022-05-18T04:43:11.5522445Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpz_uc7n7h 2022-05-18T04:43:11.5525575Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpz_uc7n7h/_remote_module_non_scriptable.py 2022-05-18T04:43:11.5685445Z dist init r=0, world=2 2022-05-18T04:43:11.5689532Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:43:11.5754983Z dist init r=1, world=2 2022-05-18T04:43:11.5759343Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:43:11.5760524Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:43:11.5792989Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:43:12.9480083Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:43:12.9480656Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:43:13.2553216Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:43:13.2553928Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:43:13.2563107Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:43:13.2563741Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:43:13.2596261Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:43:13.2610368Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:43:13.2611602Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:43:13.2699163Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:43:13.2842559Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:43:13.2848033Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:43:13.2849103Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:43:13.2860530Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:13.2946145Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:43:13.2957346Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:13.3727361Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:43:13.3728134Z warnings.warn(msg, FutureWarning) 2022-05-18T04:43:13.3729278Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:43:13.3729954Z warnings.warn(msg, FutureWarning) 2022-05-18T04:43:13.3865095Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 0 2022-05-18T04:43:13.3875144Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 1 2022-05-18T04:43:13.3876167Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:43:13.3924100Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:13.3925539Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:13.3926928Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:13.3967703Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:43:13.4013372Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:13.4014667Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:13.4015933Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:13.7645141Z ok (3.131s) 2022-05-18T04:43:13.7770892Z test_mixture_of_experts_offload_true_prefetch_post_none_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 38235 2022-05-18T04:43:13.7874725Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 38236 2022-05-18T04:43:14.6657185Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpen58armo 2022-05-18T04:43:14.6658593Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpen58armo/_remote_module_non_scriptable.py 2022-05-18T04:43:14.6778281Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp418zl3mz 2022-05-18T04:43:14.6781011Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp418zl3mz/_remote_module_non_scriptable.py 2022-05-18T04:43:14.6876114Z dist init r=1, world=2 2022-05-18T04:43:14.6880079Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:43:14.7000348Z dist init r=0, world=2 2022-05-18T04:43:14.7004573Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:43:14.7005929Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:43:14.7085547Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:43:16.0444632Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:43:16.0445149Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:43:16.3549457Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:43:16.3550179Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:43:16.3551041Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:43:16.3551698Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:43:16.3591849Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:43:16.3592372Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:43:16.3593066Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:43:16.3594689Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:43:16.3744355Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:43:16.3745501Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:43:16.3746843Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:43:16.3757717Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:16.3847300Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:43:16.3858155Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:16.4332470Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:43:16.4333154Z warnings.warn(msg, FutureWarning) 2022-05-18T04:43:16.4334830Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:43:16.4335489Z warnings.warn(msg, FutureWarning) 2022-05-18T04:43:16.4467830Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 1 2022-05-18T04:43:16.4474100Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 0 2022-05-18T04:43:16.4474779Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:43:16.4522719Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:16.4524116Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:16.4525410Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:16.4570823Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:43:16.4618569Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:16.4619853Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:16.4621129Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:16.7950935Z ok (3.031s) 2022-05-18T04:43:16.8076657Z test_mixture_of_experts_offload_true_prefetch_post_none_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 38328 2022-05-18T04:43:16.8180808Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 38329 2022-05-18T04:43:17.7167585Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp10nzcgx9 2022-05-18T04:43:17.7168540Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp10nzcgx9/_remote_module_non_scriptable.py 2022-05-18T04:43:17.7209806Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpw1c65zz0 2022-05-18T04:43:17.7212917Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpw1c65zz0/_remote_module_non_scriptable.py 2022-05-18T04:43:17.7389192Z dist init r=0, world=2 2022-05-18T04:43:17.7393251Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:43:17.7440770Z dist init r=1, world=2 2022-05-18T04:43:17.7445476Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:43:17.7446834Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:43:17.7497427Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:43:19.1225982Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:43:19.1226542Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:43:19.4298949Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:43:19.4299694Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:43:19.4371655Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:43:19.4372342Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:43:19.4402292Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:43:19.4418201Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:43:19.4419229Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:43:19.4505330Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:43:19.4652142Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:43:19.4659353Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:43:19.4660321Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:43:19.4673059Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:19.4755626Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:43:19.4766712Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:19.5250111Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:43:19.5251017Z warnings.warn(msg, FutureWarning) 2022-05-18T04:43:19.5253498Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:43:19.5254185Z warnings.warn(msg, FutureWarning) 2022-05-18T04:43:19.5397278Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 1 2022-05-18T04:43:19.5400771Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 0 2022-05-18T04:43:19.5401774Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:43:19.5457017Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:19.5458494Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:19.5459842Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:19.5500320Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:43:19.5550812Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:19.5552098Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:19.5553398Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:19.9261707Z ok (3.131s) 2022-05-18T04:43:19.9388211Z test_mixture_of_experts_offload_true_prefetch_post_shard_grad_op_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 38421 2022-05-18T04:43:19.9493660Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 38422 2022-05-18T04:43:20.8530661Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpnwmqeohs 2022-05-18T04:43:20.8532163Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpnwmqeohs/_remote_module_non_scriptable.py 2022-05-18T04:43:20.8601596Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmph436bsy0 2022-05-18T04:43:20.8604463Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmph436bsy0/_remote_module_non_scriptable.py 2022-05-18T04:43:20.8761205Z dist init r=1, world=2 2022-05-18T04:43:20.8765664Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:43:20.8823448Z dist init r=0, world=2 2022-05-18T04:43:20.8828223Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:43:20.8829179Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:43:20.8869137Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:43:22.2767465Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:43:22.2767990Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:43:22.5894682Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:43:22.5895408Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:43:22.5900695Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:43:22.5901350Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:43:22.5937552Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:43:22.5947227Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:43:22.5948004Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:43:22.6040613Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:43:22.6187742Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:43:22.6192989Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:43:22.6193995Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:43:22.6204475Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:22.6290539Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:43:22.6301794Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:22.6781755Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:43:22.6782447Z warnings.warn(msg, FutureWarning) 2022-05-18T04:43:22.6785079Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:43:22.6785748Z warnings.warn(msg, FutureWarning) 2022-05-18T04:43:22.6919004Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 0 2022-05-18T04:43:22.6929179Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 1 2022-05-18T04:43:22.6929862Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:43:22.6981428Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:22.6982725Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:22.6984215Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:22.7022158Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:43:22.7069657Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:22.7070955Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:22.7072230Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:23.0573892Z ok (3.131s) 2022-05-18T04:43:23.0701742Z test_mixture_of_experts_offload_true_prefetch_post_shard_grad_op_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 38514 2022-05-18T04:43:23.0807444Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 38515 2022-05-18T04:43:23.9843300Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpdq8cti4p 2022-05-18T04:43:23.9844352Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpdq8cti4p/_remote_module_non_scriptable.py 2022-05-18T04:43:23.9856845Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpen4yk3fa 2022-05-18T04:43:23.9859632Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpen4yk3fa/_remote_module_non_scriptable.py 2022-05-18T04:43:24.0072954Z dist init r=0, world=2 2022-05-18T04:43:24.0077041Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:43:24.0082421Z dist init r=1, world=2 2022-05-18T04:43:24.0086592Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:43:24.0087607Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:43:24.0180840Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:43:25.4027018Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:43:25.4027564Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:43:25.7115753Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:43:25.7116460Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:43:25.7117294Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:43:25.7118224Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:43:25.7159889Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:43:25.7161898Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:43:25.7162574Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:43:25.7163250Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:43:25.7313599Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:43:25.7314892Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:43:25.7315680Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:43:25.7326879Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:25.7416325Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:43:25.7427029Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:25.7895448Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:43:25.7896147Z warnings.warn(msg, FutureWarning) 2022-05-18T04:43:25.7897093Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:43:25.7897738Z warnings.warn(msg, FutureWarning) 2022-05-18T04:43:25.8033341Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 1 2022-05-18T04:43:25.8033862Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 0 2022-05-18T04:43:25.8034546Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:43:25.8035235Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:43:25.8082752Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:25.8084048Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:25.8085326Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:25.8086810Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:25.8088105Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:25.8089361Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:26.1888966Z ok (3.131s) 2022-05-18T04:43:26.2018704Z test_mixture_of_experts_offload_true_prefetch_pre_no_shard_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 38607 2022-05-18T04:43:26.2124412Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 38608 2022-05-18T04:43:27.1201792Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp3sspfdhm 2022-05-18T04:43:27.1202703Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp3sspfdhm/_remote_module_non_scriptable.py 2022-05-18T04:43:27.1220153Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpni9z68am 2022-05-18T04:43:27.1223072Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpni9z68am/_remote_module_non_scriptable.py 2022-05-18T04:43:27.1421350Z dist init r=0, world=2 2022-05-18T04:43:27.1425539Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:43:27.1450867Z dist init r=1, world=2 2022-05-18T04:43:27.1455524Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:43:27.1456760Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:43:27.1529004Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:43:28.5148140Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:43:28.5148649Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:43:28.8280085Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:43:28.8280891Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:43:28.8290273Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:43:28.8290926Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:43:28.8323338Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:43:28.8337151Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:43:28.8338412Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:43:28.8426374Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:43:28.8570765Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:43:28.8577194Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:43:28.8577908Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:43:28.8588556Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:28.8674108Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:43:28.8685571Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:28.9442514Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:43:28.9443230Z warnings.warn(msg, FutureWarning) 2022-05-18T04:43:28.9446061Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:43:28.9446761Z warnings.warn(msg, FutureWarning) 2022-05-18T04:43:28.9583376Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 0 2022-05-18T04:43:28.9593758Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 1 2022-05-18T04:43:28.9594869Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:43:28.9643177Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:28.9644634Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:28.9645919Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:28.9686988Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:43:28.9732635Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:28.9734314Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:28.9735608Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:29.4209929Z ok (3.232s) 2022-05-18T04:43:29.4342839Z test_mixture_of_experts_offload_true_prefetch_pre_no_shard_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 38716 2022-05-18T04:43:29.4451736Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 38717 2022-05-18T04:43:30.3518763Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpwjeysxtq 2022-05-18T04:43:30.3520269Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpwjeysxtq/_remote_module_non_scriptable.py 2022-05-18T04:43:30.3727618Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpoq2kjrnm 2022-05-18T04:43:30.3730252Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpoq2kjrnm/_remote_module_non_scriptable.py 2022-05-18T04:43:30.3749719Z dist init r=1, world=2 2022-05-18T04:43:30.3754345Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:43:30.3947228Z dist init r=0, world=2 2022-05-18T04:43:30.3951215Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:43:30.3952812Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:43:30.3959482Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:43:31.7850208Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:43:31.7851087Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:43:32.0956991Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:43:32.0957707Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:43:32.1001247Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:43:32.1001921Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:43:32.1027435Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:43:32.1046535Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:43:32.1047857Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:43:32.1130898Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:43:32.1273629Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:43:32.1280440Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:43:32.1281652Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:43:32.1292527Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:32.1376936Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:43:32.1387932Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:32.2137300Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:43:32.2138740Z warnings.warn(msg, FutureWarning) 2022-05-18T04:43:32.2139794Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:43:32.2140465Z warnings.warn(msg, FutureWarning) 2022-05-18T04:43:32.2273533Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 1 2022-05-18T04:43:32.2281512Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 0 2022-05-18T04:43:32.2282926Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:43:32.2329516Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:32.2331095Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:32.2332383Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:32.2376789Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:43:32.2422094Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:32.2423383Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:32.2424959Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:32.6537078Z ok (3.232s) 2022-05-18T04:43:32.6666363Z test_mixture_of_experts_offload_true_prefetch_pre_none_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 38825 2022-05-18T04:43:32.6775068Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 38826 2022-05-18T04:43:33.5909451Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp9zt38bu6 2022-05-18T04:43:33.5911187Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp9zt38bu6/_remote_module_non_scriptable.py 2022-05-18T04:43:33.6140162Z dist init r=1, world=2 2022-05-18T04:43:33.6144723Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:43:33.6209359Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpp_4q9gut 2022-05-18T04:43:33.6212111Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpp_4q9gut/_remote_module_non_scriptable.py 2022-05-18T04:43:33.6430751Z dist init r=0, world=2 2022-05-18T04:43:33.6435189Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:43:33.6436052Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:43:33.6452784Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:43:35.0110116Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:43:35.0110644Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:43:35.3228035Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:43:35.3228741Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:43:35.3245189Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:43:35.3245836Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:43:35.3271318Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:43:35.3292017Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:43:35.3293138Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:43:35.3374249Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:43:35.3518467Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:43:35.3523810Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:43:35.3524499Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:43:35.3535591Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:35.3621710Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:43:35.3632951Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:35.4118352Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:43:35.4119046Z warnings.warn(msg, FutureWarning) 2022-05-18T04:43:35.4121944Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:43:35.4122616Z warnings.warn(msg, FutureWarning) 2022-05-18T04:43:35.4254400Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 0 2022-05-18T04:43:35.4265609Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 1 2022-05-18T04:43:35.4266417Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:43:35.4316863Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:35.4318170Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:35.4319426Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:35.4357533Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:43:35.4404898Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:35.4406185Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:35.4407451Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:35.7855256Z ok (3.132s) 2022-05-18T04:43:35.7984794Z test_mixture_of_experts_offload_true_prefetch_pre_none_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 38918 2022-05-18T04:43:35.8091833Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 38919 2022-05-18T04:43:36.7272297Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpj7ppx5__ 2022-05-18T04:43:36.7273496Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpj7ppx5__/_remote_module_non_scriptable.py 2022-05-18T04:43:36.7445116Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpk64fiu7_ 2022-05-18T04:43:36.7447827Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpk64fiu7_/_remote_module_non_scriptable.py 2022-05-18T04:43:36.7502780Z dist init r=1, world=2 2022-05-18T04:43:36.7507447Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:43:36.7669412Z dist init r=0, world=2 2022-05-18T04:43:36.7673432Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:43:36.7674539Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:43:36.7712700Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:43:38.1556155Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:43:38.1556686Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:43:38.4668986Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:43:38.4681728Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:43:38.4682619Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:43:38.4683272Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:43:38.4712234Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:43:38.4729136Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:43:38.4730359Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:43:38.4815596Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:43:38.4963145Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:43:38.4968989Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:43:38.4969859Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:43:38.4981471Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:38.5066201Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:43:38.5077663Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:38.5565733Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:43:38.5566428Z warnings.warn(msg, FutureWarning) 2022-05-18T04:43:38.5569130Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:43:38.5569804Z warnings.warn(msg, FutureWarning) 2022-05-18T04:43:38.5703248Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 0 2022-05-18T04:43:38.5713089Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 1 2022-05-18T04:43:38.5714091Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:43:38.5765323Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:38.5766640Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:38.5767902Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:38.5806346Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:43:38.5855803Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:38.5857100Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:38.5858373Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:38.9173821Z ok (3.132s) 2022-05-18T04:43:38.9299970Z test_mixture_of_experts_offload_true_prefetch_pre_shard_grad_op_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 39011 2022-05-18T04:43:38.9403787Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 39012 2022-05-18T04:43:39.8392593Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp9nhkwn2s 2022-05-18T04:43:39.8394522Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp9nhkwn2s/_remote_module_non_scriptable.py 2022-05-18T04:43:39.8623381Z dist init r=1, world=2 2022-05-18T04:43:39.8628230Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:43:39.8732775Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpvs6el1iy 2022-05-18T04:43:39.8735492Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpvs6el1iy/_remote_module_non_scriptable.py 2022-05-18T04:43:39.8954116Z dist init r=0, world=2 2022-05-18T04:43:39.8958623Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:43:39.8959431Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:43:39.9036321Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:43:41.2893470Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:43:41.2894013Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:43:41.5972924Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:43:41.5973644Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:43:41.6025279Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:43:41.6025947Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:43:41.6052777Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:43:41.6069987Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:43:41.6070979Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:43:41.6156639Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:43:41.6302748Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:43:41.6312106Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:43:41.6313578Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:43:41.6326315Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:41.6406272Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:43:41.6417782Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:41.6888563Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:43:41.6890014Z warnings.warn(msg, FutureWarning) 2022-05-18T04:43:41.6892247Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:43:41.6893513Z warnings.warn(msg, FutureWarning) 2022-05-18T04:43:41.7023206Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 0 2022-05-18T04:43:41.7029116Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 1 2022-05-18T04:43:41.7030460Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:43:41.7080405Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:41.7082955Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:41.7085404Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:41.7126582Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:43:41.7175258Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:41.7178360Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:41.7181052Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:42.0485663Z ok (3.131s) 2022-05-18T04:43:42.0613163Z test_mixture_of_experts_offload_true_prefetch_pre_shard_grad_op_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 39104 2022-05-18T04:43:42.0717566Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 39105 2022-05-18T04:43:42.9394197Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp2a7thv1s 2022-05-18T04:43:42.9395766Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp2a7thv1s/_remote_module_non_scriptable.py 2022-05-18T04:43:42.9625536Z dist init r=1, world=2 2022-05-18T04:43:42.9630384Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:43:42.9677326Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpgexlc7cl 2022-05-18T04:43:42.9679707Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpgexlc7cl/_remote_module_non_scriptable.py 2022-05-18T04:43:42.9900507Z dist init r=0, world=2 2022-05-18T04:43:42.9904583Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:43:42.9905657Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:43:42.9937851Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:43:44.3839702Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:43:44.3840683Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:43:44.6876278Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:43:44.6877023Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:43:44.6907298Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:43:44.6907953Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:43:44.6934631Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:43:44.6951943Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:43:44.6953149Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:43:44.7038016Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:43:44.7187208Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:43:44.7195326Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:43:44.7196250Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:43:44.7207231Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:44.7290031Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:43:44.7301408Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:44.7778748Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:43:44.7779970Z warnings.warn(msg, FutureWarning) 2022-05-18T04:43:44.7781053Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:43:44.7781979Z warnings.warn(msg, FutureWarning) 2022-05-18T04:43:44.7920311Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 0 2022-05-18T04:43:44.7920850Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 1 2022-05-18T04:43:44.7921826Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:43:44.7922557Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:43:44.7971588Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:44.7973169Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:44.7974442Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:44.7975745Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:44.7977008Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:44.7978270Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:45.1809567Z ok (3.132s) 2022-05-18T04:43:45.1948183Z test_mixture_of_experts_with_delay_before_free_offload_false_none_no_shard (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 39197 2022-05-18T04:43:45.2056571Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 39198 2022-05-18T04:43:46.1070541Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmprt8drhfy 2022-05-18T04:43:46.1071589Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmprt8drhfy/_remote_module_non_scriptable.py 2022-05-18T04:43:46.1103775Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmptv9dylf7 2022-05-18T04:43:46.1106410Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmptv9dylf7/_remote_module_non_scriptable.py 2022-05-18T04:43:46.1290811Z dist init r=0, world=2 2022-05-18T04:43:46.1295134Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:43:46.1333381Z dist init r=1, world=2 2022-05-18T04:43:46.1338000Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:43:46.1339288Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:43:46.1398596Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:43:47.5034435Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:43:47.5034967Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:43:47.8158441Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:43:47.8159156Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:43:47.8160028Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:43:47.8160668Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:43:47.8202626Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:43:47.8204033Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:43:47.8204729Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:43:47.8205405Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:43:47.8803484Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:43:47.8804227Z warnings.warn(msg, FutureWarning) 2022-05-18T04:43:47.8809346Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:43:47.8810010Z warnings.warn(msg, FutureWarning) 2022-05-18T04:43:47.8948841Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:43:47.8959903Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:43:47.8960603Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:43:47.9052040Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:43:48.3152406Z ok (3.134s) 2022-05-18T04:43:48.3282451Z test_mixture_of_experts_with_delay_before_free_offload_false_none_none (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 39304 2022-05-18T04:43:48.3390887Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 39305 2022-05-18T04:43:49.2352355Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpi_h147em 2022-05-18T04:43:49.2353603Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpi_h147em/_remote_module_non_scriptable.py 2022-05-18T04:43:49.2427491Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpz7fnngg7 2022-05-18T04:43:49.2430720Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpz7fnngg7/_remote_module_non_scriptable.py 2022-05-18T04:43:49.2574639Z dist init r=1, world=2 2022-05-18T04:43:49.2578571Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:43:49.2650077Z dist init r=0, world=2 2022-05-18T04:43:49.2654819Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:43:49.2655614Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:43:49.2681587Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:43:50.6505279Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:43:50.6505820Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:43:50.9560937Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:43:50.9562181Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:43:50.9612818Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:43:50.9614135Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:43:50.9639912Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:43:50.9660531Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:43:50.9661881Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:43:50.9743467Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:43:51.4672711Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:43:51.4674127Z warnings.warn(msg, FutureWarning) 2022-05-18T04:43:51.4676013Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:43:51.4677256Z warnings.warn(msg, FutureWarning) 2022-05-18T04:43:51.4812678Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:43:51.4814751Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:43:51.4816385Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:43:51.4916096Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:43:51.8470181Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:51.8473233Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:51.8475822Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:51.8478524Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:51.8481181Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:51.8483694Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:52.1485034Z ok (3.833s) 2022-05-18T04:43:52.1611677Z test_mixture_of_experts_with_delay_before_free_offload_false_none_shard_grad_op (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 39395 2022-05-18T04:43:52.1715526Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 39396 2022-05-18T04:43:53.0698696Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpmcs58fbi 2022-05-18T04:43:53.0699827Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpmcs58fbi/_remote_module_non_scriptable.py 2022-05-18T04:43:53.0717308Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpdsggv1sg 2022-05-18T04:43:53.0720040Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpdsggv1sg/_remote_module_non_scriptable.py 2022-05-18T04:43:53.0926864Z dist init r=1, world=2 2022-05-18T04:43:53.0931353Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:43:53.0942348Z dist init r=0, world=2 2022-05-18T04:43:53.0946554Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:43:53.0947566Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:43:53.1035039Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:43:54.4611058Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:43:54.4611578Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:43:54.7801052Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:43:54.7802086Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:43:54.7802955Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:43:54.7803701Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:43:54.7845032Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:43:54.7845542Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:43:54.7846237Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:43:54.7846937Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:43:54.8200603Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:43:54.8201297Z warnings.warn(msg, FutureWarning) 2022-05-18T04:43:54.8202229Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:43:54.8202884Z warnings.warn(msg, FutureWarning) 2022-05-18T04:43:54.8343414Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:43:54.8344664Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:43:54.8345345Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:43:54.8446298Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:43:54.8685279Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:54.8686590Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:54.8687867Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:54.8689130Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:54.8690794Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:54.8692302Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:43:55.1795445Z ok (3.031s) 2022-05-18T04:43:55.1919969Z test_mixture_of_experts_with_delay_before_free_offload_false_prefetch_post_no_shard (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 39486 2022-05-18T04:43:55.2025490Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 39487 2022-05-18T04:43:56.1091523Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp2v9_v4mg 2022-05-18T04:43:56.1092378Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp2v9_v4mg/_remote_module_non_scriptable.py 2022-05-18T04:43:56.1101136Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpw1ohp97o 2022-05-18T04:43:56.1103831Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpw1ohp97o/_remote_module_non_scriptable.py 2022-05-18T04:43:56.1315439Z dist init r=1, world=2 2022-05-18T04:43:56.1319596Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:43:56.1332904Z dist init r=0, world=2 2022-05-18T04:43:56.1337341Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:43:56.1338352Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:43:56.1423188Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:43:57.5138565Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:43:57.5139126Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:43:57.8220680Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:43:57.8221406Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:43:57.8305791Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:43:57.8306425Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:43:57.8331593Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:43:57.8352781Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:43:57.8353519Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:43:57.8434674Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:43:57.9025180Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:43:57.9025893Z warnings.warn(msg, FutureWarning) 2022-05-18T04:43:57.9034259Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:43:57.9035252Z warnings.warn(msg, FutureWarning) 2022-05-18T04:43:57.9169791Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:43:57.9182023Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:43:57.9182738Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:43:57.9272823Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:43:58.3107230Z ok (3.131s) 2022-05-18T04:43:58.3236796Z test_mixture_of_experts_with_delay_before_free_offload_false_prefetch_post_none (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 39593 2022-05-18T04:43:58.3343886Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 39594 2022-05-18T04:43:59.2294739Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp317zl67w 2022-05-18T04:43:59.2295896Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp317zl67w/_remote_module_non_scriptable.py 2022-05-18T04:43:59.2314955Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpr2di5lrj 2022-05-18T04:43:59.2317810Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpr2di5lrj/_remote_module_non_scriptable.py 2022-05-18T04:43:59.2515691Z dist init r=0, world=2 2022-05-18T04:43:59.2519831Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:43:59.2546303Z dist init r=1, world=2 2022-05-18T04:43:59.2550700Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:43:59.2551990Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:43:59.2623294Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:44:00.6429553Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:44:00.6430129Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:44:00.9463429Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:44:00.9464136Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:44:00.9547690Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:44:00.9548343Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:44:00.9573492Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:44:00.9593996Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:44:00.9594731Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:44:00.9676552Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:44:01.4662401Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:44:01.4663417Z warnings.warn(msg, FutureWarning) 2022-05-18T04:44:01.4664458Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:44:01.4665130Z warnings.warn(msg, FutureWarning) 2022-05-18T04:44:01.4801584Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:44:01.4811717Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:44:01.4812412Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:44:01.4904634Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:44:01.8486292Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:44:01.8487629Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:44:01.8488877Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:44:01.8490164Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:44:01.8491607Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:44:01.8492877Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:44:02.1437109Z ok (3.833s) 2022-05-18T04:44:02.1574094Z test_mixture_of_experts_with_delay_before_free_offload_false_prefetch_post_shard_grad_op (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 39684 2022-05-18T04:44:02.1682403Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 39685 2022-05-18T04:44:03.0691007Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpk2f32r8b 2022-05-18T04:44:03.0692181Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpk2f32r8b/_remote_module_non_scriptable.py 2022-05-18T04:44:03.0693766Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp0yx36akp 2022-05-18T04:44:03.0697001Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp0yx36akp/_remote_module_non_scriptable.py 2022-05-18T04:44:03.0914970Z dist init r=1, world=2 2022-05-18T04:44:03.0919483Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:44:03.0923507Z dist init r=0, world=2 2022-05-18T04:44:03.0928132Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:44:03.0929540Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:44:03.1023207Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:44:04.4981412Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:44:04.4981959Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:44:04.8083643Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:44:04.8084354Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:44:04.8085193Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:44:04.8085837Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:44:04.8127373Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:44:04.8129558Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:44:04.8130760Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:44:04.8230989Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:44:04.8597147Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:44:04.8598576Z warnings.warn(msg, FutureWarning) 2022-05-18T04:44:04.8600500Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:44:04.8601746Z warnings.warn(msg, FutureWarning) 2022-05-18T04:44:04.8738991Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:44:04.8747362Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:44:04.8748299Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:44:04.8841814Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:44:04.9088441Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:44:04.9091463Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:44:04.9094458Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:44:04.9097143Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:44:04.9099848Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:44:04.9102418Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:44:05.2763949Z ok (3.132s) 2022-05-18T04:44:05.2889000Z test_mixture_of_experts_with_delay_before_free_offload_false_prefetch_pre_no_shard (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 39775 2022-05-18T04:44:05.2993296Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 39776 2022-05-18T04:44:06.2050464Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpdmhcnkpt 2022-05-18T04:44:06.2051956Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpdmhcnkpt/_remote_module_non_scriptable.py 2022-05-18T04:44:06.2282143Z dist init r=1, world=2 2022-05-18T04:44:06.2286422Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:44:06.2333787Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp1cx1a4lj 2022-05-18T04:44:06.2336433Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp1cx1a4lj/_remote_module_non_scriptable.py 2022-05-18T04:44:06.2555352Z dist init r=0, world=2 2022-05-18T04:44:06.2559455Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:44:06.2560587Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:44:06.2593194Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:44:07.6194378Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:44:07.6194934Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:44:07.9254388Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:44:07.9255110Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:44:07.9274868Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:44:07.9275842Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:44:07.9301249Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:44:07.9319179Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:44:07.9320009Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:44:07.9404509Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:44:08.0002243Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:44:08.0002981Z warnings.warn(msg, FutureWarning) 2022-05-18T04:44:08.0003934Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:44:08.0004597Z warnings.warn(msg, FutureWarning) 2022-05-18T04:44:08.0142894Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:44:08.0145865Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:44:08.0146936Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:44:08.0245718Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:44:08.4075377Z ok (3.131s) 2022-05-18T04:44:08.4202536Z test_mixture_of_experts_with_delay_before_free_offload_false_prefetch_pre_none (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 39882 2022-05-18T04:44:08.4306160Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 39883 2022-05-18T04:44:09.3363256Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpb4jlvxca 2022-05-18T04:44:09.3364454Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpb4jlvxca/_remote_module_non_scriptable.py 2022-05-18T04:44:09.3594034Z dist init r=1, world=2 2022-05-18T04:44:09.3598319Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:44:09.3666327Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpbkfs657b 2022-05-18T04:44:09.3669421Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpbkfs657b/_remote_module_non_scriptable.py 2022-05-18T04:44:09.3886865Z dist init r=0, world=2 2022-05-18T04:44:09.3891788Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:44:09.3892568Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:44:09.3905140Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:44:10.7730535Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:44:10.7731303Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:44:11.0835803Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:44:11.0836832Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:44:11.0909625Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:44:11.0910428Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:44:11.0935126Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:44:11.0956549Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:44:11.0957629Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:44:11.1038389Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:44:11.6045901Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:44:11.6046733Z warnings.warn(msg, FutureWarning) 2022-05-18T04:44:11.6048642Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:44:11.6049299Z warnings.warn(msg, FutureWarning) 2022-05-18T04:44:11.6185442Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:44:11.6197262Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:44:11.6197968Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:44:11.6288371Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:44:12.0622549Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:44:12.0623854Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:44:12.0625121Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:44:12.0626400Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:44:12.0627662Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:44:12.0629117Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:44:12.3401440Z ok (3.932s) 2022-05-18T04:44:12.3527230Z test_mixture_of_experts_with_delay_before_free_offload_false_prefetch_pre_shard_grad_op (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 39973 2022-05-18T04:44:12.3632978Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 39974 2022-05-18T04:44:13.2713734Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp08wkdw4e 2022-05-18T04:44:13.2714900Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp08wkdw4e/_remote_module_non_scriptable.py 2022-05-18T04:44:13.2759995Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpjv1vy4z8 2022-05-18T04:44:13.2762817Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpjv1vy4z8/_remote_module_non_scriptable.py 2022-05-18T04:44:13.2944887Z dist init r=1, world=2 2022-05-18T04:44:13.2949149Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:44:13.2982534Z dist init r=0, world=2 2022-05-18T04:44:13.2986863Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:44:13.2988063Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:44:13.3052719Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:44:14.6619458Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:44:14.6620006Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:44:14.9665978Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:44:14.9666675Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:44:14.9715539Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:44:14.9716183Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:44:14.9741358Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:44:14.9759960Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:44:14.9760743Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:44:14.9844471Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:44:15.0196175Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:44:15.0196846Z warnings.warn(msg, FutureWarning) 2022-05-18T04:44:15.0197781Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:44:15.0198664Z warnings.warn(msg, FutureWarning) 2022-05-18T04:44:15.0336744Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:44:15.0337375Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:44:15.0338094Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:44:15.0338805Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:44:15.0581133Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:44:15.0582448Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:44:15.0583722Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:44:15.0584992Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:44:15.0586245Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:44:15.0587507Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:44:15.3710911Z ok (3.031s) 2022-05-18T04:44:15.3847543Z test_mixture_of_experts_with_delay_before_free_offload_true_none_no_shard (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 40064 2022-05-18T04:44:15.3953122Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 40065 2022-05-18T04:44:16.3002346Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpbcqibd63 2022-05-18T04:44:16.3003360Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpbcqibd63/_remote_module_non_scriptable.py 2022-05-18T04:44:16.3003915Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpspc2wxfz 2022-05-18T04:44:16.3006688Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpspc2wxfz/_remote_module_non_scriptable.py 2022-05-18T04:44:16.3225598Z dist init r=0, world=2 2022-05-18T04:44:16.3229674Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:44:16.3232111Z dist init r=1, world=2 2022-05-18T04:44:16.3236442Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:44:16.3237459Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:44:16.3333278Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:44:17.7157681Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:44:17.7158290Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:44:18.0227285Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:44:18.0228086Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:44:18.0244067Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:44:18.0244747Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:44:18.0269918Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:44:18.0290554Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:44:18.0291560Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:44:18.0372907Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:44:18.0514421Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:44:18.0519905Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:44:18.0520592Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:44:18.0531323Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:44:18.0617557Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:44:18.0628503Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:44:18.1384982Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:44:18.1385697Z warnings.warn(msg, FutureWarning) 2022-05-18T04:44:18.1390871Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:44:18.1391555Z warnings.warn(msg, FutureWarning) 2022-05-18T04:44:18.1525174Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 0 2022-05-18T04:44:18.1537031Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 1 2022-05-18T04:44:18.1537727Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:44:18.1628323Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:44:18.6038802Z ok (3.233s) 2022-05-18T04:44:18.6169807Z test_mixture_of_experts_with_delay_before_free_offload_true_none_none (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 40173 2022-05-18T04:44:18.6277121Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 40174 2022-05-18T04:44:19.5528477Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp2a_1eao7 2022-05-18T04:44:19.5529494Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp2a_1eao7/_remote_module_non_scriptable.py 2022-05-18T04:44:19.5688587Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp2otaibjy 2022-05-18T04:44:19.5691296Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp2otaibjy/_remote_module_non_scriptable.py 2022-05-18T04:44:19.5760237Z dist init r=1, world=2 2022-05-18T04:44:19.5764667Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:44:19.5910748Z dist init r=0, world=2 2022-05-18T04:44:19.5915061Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:44:19.5915829Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:44:19.5970007Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:44:20.9790901Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:44:20.9791474Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:44:21.2871883Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:44:21.2873214Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:44:21.2930192Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:44:21.2931090Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:44:21.2956611Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:44:21.2975514Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:44:21.2976462Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:44:21.3059631Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:44:21.3203482Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:44:21.3211359Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:44:21.3212340Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:44:21.3223410Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:44:21.3306601Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:44:21.3317431Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:44:21.8827368Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:44:21.8828074Z warnings.warn(msg, FutureWarning) 2022-05-18T04:44:21.8830446Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:44:21.8831110Z warnings.warn(msg, FutureWarning) 2022-05-18T04:44:21.8963554Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 1 2022-05-18T04:44:21.8969170Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 0 2022-05-18T04:44:21.8969919Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:44:21.9066552Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:44:22.7377863Z ok (4.134s) 2022-05-18T04:44:22.7505096Z test_mixture_of_experts_with_delay_before_free_offload_true_none_shard_grad_op (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 40266 2022-05-18T04:44:22.7610972Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 40267 2022-05-18T04:44:23.6694217Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpzf2ieb1d 2022-05-18T04:44:23.6695362Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpzf2ieb1d/_remote_module_non_scriptable.py 2022-05-18T04:44:23.6721243Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmptopjpa9z 2022-05-18T04:44:23.6724175Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmptopjpa9z/_remote_module_non_scriptable.py 2022-05-18T04:44:23.6914140Z dist init r=0, world=2 2022-05-18T04:44:23.6918402Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:44:23.6952074Z dist init r=1, world=2 2022-05-18T04:44:23.6956731Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:44:23.6958104Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:44:23.7022093Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:44:25.0724121Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:44:25.0724677Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:44:25.3846162Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:44:25.3847167Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:44:25.3864488Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:44:25.3865280Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:44:25.3890357Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:44:25.3910736Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:44:25.3911434Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:44:25.3993466Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:44:25.4139165Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:44:25.4146531Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:44:25.4147219Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:44:25.4158154Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:44:25.4242402Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:44:25.4253781Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:44:25.4730159Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:44:25.4731060Z warnings.warn(msg, FutureWarning) 2022-05-18T04:44:25.4733919Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:44:25.4734607Z warnings.warn(msg, FutureWarning) 2022-05-18T04:44:25.4867480Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 0 2022-05-18T04:44:25.4877850Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 1 2022-05-18T04:44:25.4879001Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:44:25.4929174Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:44:25.4930703Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:44:25.4932271Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:44:25.4970533Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:44:25.5019277Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:44:25.5020573Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:44:25.5021841Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:44:25.8695396Z ok (3.132s) 2022-05-18T04:44:25.8822739Z test_mixture_of_experts_with_delay_before_free_offload_true_prefetch_post_no_shard (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 40359 2022-05-18T04:44:25.8931408Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 40360 2022-05-18T04:44:26.7882342Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp74g0fl_v 2022-05-18T04:44:26.7883185Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp74g0fl_v/_remote_module_non_scriptable.py 2022-05-18T04:44:26.7920315Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp81s551rk 2022-05-18T04:44:26.7923102Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp81s551rk/_remote_module_non_scriptable.py 2022-05-18T04:44:26.8111822Z dist init r=1, world=2 2022-05-18T04:44:26.8116151Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:44:26.8145578Z dist init r=0, world=2 2022-05-18T04:44:26.8150186Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:44:26.8151236Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:44:26.8219807Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:44:28.1541620Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:44:28.1542403Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:44:28.4610799Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:44:28.4611513Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:44:28.4612373Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:44:28.4613362Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:44:28.4653546Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:44:28.4656377Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:44:28.4657652Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:44:28.4756645Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:44:28.4897543Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:44:28.4906468Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:44:28.4907155Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:44:28.4917802Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:44:28.5000582Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:44:28.5010965Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:44:28.5765307Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:44:28.5766025Z warnings.warn(msg, FutureWarning) 2022-05-18T04:44:28.5769721Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:44:28.5770658Z warnings.warn(msg, FutureWarning) 2022-05-18T04:44:28.5901367Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 0 2022-05-18T04:44:28.5907675Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 1 2022-05-18T04:44:28.5908381Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:44:28.6004261Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:44:29.0013791Z ok (3.132s) 2022-05-18T04:44:29.0143639Z test_mixture_of_experts_with_delay_before_free_offload_true_prefetch_post_none (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 40468 2022-05-18T04:44:29.0250105Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 40469 2022-05-18T04:44:29.9173292Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp4n01lve7 2022-05-18T04:44:29.9174311Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp4n01lve7/_remote_module_non_scriptable.py 2022-05-18T04:44:29.9402450Z dist init r=1, world=2 2022-05-18T04:44:29.9406708Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:44:29.9683124Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpdbqk23xz 2022-05-18T04:44:29.9685809Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpdbqk23xz/_remote_module_non_scriptable.py 2022-05-18T04:44:29.9904292Z dist init r=0, world=2 2022-05-18T04:44:29.9908594Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:44:29.9909410Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:44:29.9915713Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:44:31.3636828Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:44:31.3637370Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:44:31.6740841Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:44:31.6741591Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:44:31.6800612Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:44:31.6801266Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:44:31.6826448Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:44:31.6847262Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:44:31.6848105Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:44:31.6929641Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:44:31.7077241Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:44:31.7082887Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:44:31.7083850Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:44:31.7095267Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:44:31.7180052Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:44:31.7191204Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:44:32.2710556Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:44:32.2711245Z warnings.warn(msg, FutureWarning) 2022-05-18T04:44:32.2714047Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:44:32.2714984Z warnings.warn(msg, FutureWarning) 2022-05-18T04:44:32.2847858Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 0 2022-05-18T04:44:32.2859546Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 1 2022-05-18T04:44:32.2860480Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:44:32.2950737Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:44:33.1351192Z ok (4.134s) 2022-05-18T04:44:33.1476603Z test_mixture_of_experts_with_delay_before_free_offload_true_prefetch_post_shard_grad_op (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 40561 2022-05-18T04:44:33.1580031Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 40562 2022-05-18T04:44:34.0532590Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpvr807yex 2022-05-18T04:44:34.0533594Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpvr807yex/_remote_module_non_scriptable.py 2022-05-18T04:44:34.0590878Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpoq66eyku 2022-05-18T04:44:34.0593715Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpoq66eyku/_remote_module_non_scriptable.py 2022-05-18T04:44:34.0754878Z dist init r=0, world=2 2022-05-18T04:44:34.0759088Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:44:34.0823271Z dist init r=1, world=2 2022-05-18T04:44:34.0827634Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:44:34.0828776Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:44:34.0862176Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:44:35.4459079Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:44:35.4459615Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:44:35.7526271Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:44:35.7526967Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:44:35.7596739Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:44:35.7597384Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:44:35.7622724Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:44:35.7642916Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:44:35.7643593Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:44:35.7725944Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:44:35.7872102Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:44:35.7878858Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:44:35.7879554Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:44:35.7890467Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:44:35.7974545Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:44:35.7985242Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:44:35.8462733Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:44:35.8463417Z warnings.warn(msg, FutureWarning) 2022-05-18T04:44:35.8465927Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:44:35.8466588Z warnings.warn(msg, FutureWarning) 2022-05-18T04:44:35.8599086Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 0 2022-05-18T04:44:35.8609457Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 1 2022-05-18T04:44:35.8610143Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:44:35.8661112Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:44:35.8662410Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:44:35.8663689Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:44:35.8701927Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:44:35.8750107Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:44:35.8751388Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:44:35.8752892Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:44:36.2667296Z ok (3.131s) 2022-05-18T04:44:36.2795883Z test_mixture_of_experts_with_delay_before_free_offload_true_prefetch_pre_no_shard (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 40654 2022-05-18T04:44:36.2902088Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 40655 2022-05-18T04:44:37.1995176Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpaowhc6g5 2022-05-18T04:44:37.1996031Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpaowhc6g5/_remote_module_non_scriptable.py 2022-05-18T04:44:37.2017106Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp4ttqz3jt 2022-05-18T04:44:37.2020261Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp4ttqz3jt/_remote_module_non_scriptable.py 2022-05-18T04:44:37.2216262Z dist init r=0, world=2 2022-05-18T04:44:37.2220593Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:44:37.2247053Z dist init r=1, world=2 2022-05-18T04:44:37.2251497Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:44:37.2252543Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:44:37.2324600Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:44:38.5900184Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:44:38.5900731Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:44:38.8953116Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:44:38.8953831Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:44:38.8999632Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:44:38.9000299Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:44:38.9025559Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:44:38.9044051Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:44:38.9045100Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:44:38.9128903Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:44:38.9268311Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:44:38.9276865Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:44:38.9277569Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:44:38.9288763Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:44:38.9371521Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:44:38.9382178Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:44:39.0141007Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:44:39.0141758Z warnings.warn(msg, FutureWarning) 2022-05-18T04:44:39.0142885Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:44:39.0143549Z warnings.warn(msg, FutureWarning) 2022-05-18T04:44:39.0277151Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 1 2022-05-18T04:44:39.0282610Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 0 2022-05-18T04:44:39.0283642Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:44:39.0379856Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:44:39.3984543Z ok (3.132s) 2022-05-18T04:44:39.4111161Z test_mixture_of_experts_with_delay_before_free_offload_true_prefetch_pre_none (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 40763 2022-05-18T04:44:39.4217534Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 40764 2022-05-18T04:44:40.3216079Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpqnku3yp4 2022-05-18T04:44:40.3217378Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpqnku3yp4/_remote_module_non_scriptable.py 2022-05-18T04:44:40.3447700Z dist init r=1, world=2 2022-05-18T04:44:40.3452116Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:44:40.3552049Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpxirksgoh 2022-05-18T04:44:40.3554614Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpxirksgoh/_remote_module_non_scriptable.py 2022-05-18T04:44:40.3774239Z dist init r=0, world=2 2022-05-18T04:44:40.3778528Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:44:40.3779637Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:44:40.3861288Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:44:41.7557492Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:44:41.7558035Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:44:42.0641658Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:44:42.0642721Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:44:42.0646826Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:44:42.0647513Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:44:42.0683799Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:44:42.0693249Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:44:42.0694249Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:44:42.0786991Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:44:42.0931746Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:44:42.0938525Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:44:42.0939381Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:44:42.0950390Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:44:42.1035194Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:44:42.1046027Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:44:42.6560008Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:44:42.6560711Z warnings.warn(msg, FutureWarning) 2022-05-18T04:44:42.6563263Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:44:42.6563945Z warnings.warn(msg, FutureWarning) 2022-05-18T04:44:42.6696814Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 1 2022-05-18T04:44:42.6705834Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 0 2022-05-18T04:44:42.6706980Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:44:42.6799954Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:44:43.5318749Z ok (4.133s) 2022-05-18T04:44:43.5448541Z test_mixture_of_experts_with_delay_before_free_offload_true_prefetch_pre_shard_grad_op (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 40856 2022-05-18T04:44:43.5554757Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 40857 2022-05-18T04:44:44.4315598Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpy5j7duan 2022-05-18T04:44:44.4316669Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpy5j7duan/_remote_module_non_scriptable.py 2022-05-18T04:44:44.4502933Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmprt0jqvf1 2022-05-18T04:44:44.4505470Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmprt0jqvf1/_remote_module_non_scriptable.py 2022-05-18T04:44:44.4536760Z dist init r=0, world=2 2022-05-18T04:44:44.4541078Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:44:44.4732053Z dist init r=1, world=2 2022-05-18T04:44:44.4736539Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:44:44.4737482Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:44:44.4745922Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:44:45.8412639Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:44:45.8413191Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:44:46.1551258Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:44:46.1551971Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:44:46.1556001Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:44:46.1556668Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:44:46.1596329Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:44:46.1601107Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:44:46.1601796Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:44:46.1699547Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:44:46.1850232Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 0 2022-05-18T04:44:46.1860850Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:3 to store for rank: 1 2022-05-18T04:44:46.1861563Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:44:46.1872743Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:44:46.1953260Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:3 with 2 nodes. 2022-05-18T04:44:46.1964033Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:44:46.2445997Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:44:46.2446944Z warnings.warn(msg, FutureWarning) 2022-05-18T04:44:46.2448167Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:44:46.2448870Z warnings.warn(msg, FutureWarning) 2022-05-18T04:44:46.2583137Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 0 2022-05-18T04:44:46.2593733Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:4 to store for rank: 1 2022-05-18T04:44:46.2594437Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:44:46.2645030Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:44:46.2646327Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:44:46.2647613Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:44:46.2686109Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:4 with 2 nodes. 2022-05-18T04:44:46.2734539Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:44:46.2735827Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:44:46.2737119Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:44:46.6637193Z ok (3.132s) 2022-05-18T04:44:46.6769919Z test_nested_all_wrapped_model_offload_false_none_no_shard_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 40949 2022-05-18T04:44:46.6878220Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 40950 2022-05-18T04:44:47.5933806Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpshtjs0ko 2022-05-18T04:44:47.5934931Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpshtjs0ko/_remote_module_non_scriptable.py 2022-05-18T04:44:47.5937303Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp07wm_el0 2022-05-18T04:44:47.5940386Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp07wm_el0/_remote_module_non_scriptable.py 2022-05-18T04:44:47.6156978Z dist init r=0, world=2 2022-05-18T04:44:47.6161210Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:44:47.6169692Z dist init r=1, world=2 2022-05-18T04:44:47.6174479Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:44:47.6175268Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:44:47.6264488Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:44:49.0048061Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:44:49.0048628Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:44:49.3139643Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:44:49.3148390Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:44:49.3168784Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:44:49.3169454Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:44:49.3179710Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:44:49.3180366Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:44:49.3551816Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:44:49.3552493Z warnings.warn(msg, FutureWarning) 2022-05-18T04:44:49.3566543Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:44:49.3567201Z warnings.warn(msg, FutureWarning) 2022-05-18T04:44:49.3648730Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:44:49.3650779Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:44:49.6957187Z ok (3.032s) 2022-05-18T04:44:49.7083754Z test_nested_all_wrapped_model_offload_false_none_no_shard_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 41036 2022-05-18T04:44:49.7187595Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 41037 2022-05-18T04:44:50.5645218Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpitk6uy0q 2022-05-18T04:44:50.5645844Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpitk6uy0q/_remote_module_non_scriptable.py 2022-05-18T04:44:50.5867318Z dist init r=0, world=2 2022-05-18T04:44:50.5871517Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:44:50.6141245Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpif_wzak6 2022-05-18T04:44:50.6144142Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpif_wzak6/_remote_module_non_scriptable.py 2022-05-18T04:44:50.6370010Z dist init r=1, world=2 2022-05-18T04:44:50.6374870Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:44:50.6375810Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:44:50.6380927Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:44:52.0170552Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:44:52.0171307Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:44:52.3274875Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:44:52.3283782Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:44:52.3305377Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:44:52.3306064Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:44:52.3315685Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:44:52.3316327Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:44:52.3689286Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:44:52.3689946Z warnings.warn(msg, FutureWarning) 2022-05-18T04:44:52.3704022Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:44:52.3704693Z warnings.warn(msg, FutureWarning) 2022-05-18T04:44:52.3787026Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:44:52.3789940Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:44:52.7267174Z ok (3.031s) 2022-05-18T04:44:52.7392720Z test_nested_all_wrapped_model_offload_false_none_none_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 41123 2022-05-18T04:44:52.7498781Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 41124 2022-05-18T04:44:53.6499590Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpcuei93_b 2022-05-18T04:44:53.6501106Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpcuei93_b/_remote_module_non_scriptable.py 2022-05-18T04:44:53.6735028Z dist init r=0, world=2 2022-05-18T04:44:53.6739326Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:44:53.6947466Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpg9yqfoo4 2022-05-18T04:44:53.6950480Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpg9yqfoo4/_remote_module_non_scriptable.py 2022-05-18T04:44:53.7168436Z dist init r=1, world=2 2022-05-18T04:44:53.7173058Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:44:53.7174272Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:44:53.7249901Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:44:55.1111353Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:44:55.1111961Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:44:55.4244189Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:44:55.4244749Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:44:55.4273492Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:44:55.4274175Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:44:55.4275029Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:44:55.4275656Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:44:55.4800197Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:44:55.4800873Z warnings.warn(msg, FutureWarning) 2022-05-18T04:44:55.4803876Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:44:55.4804560Z warnings.warn(msg, FutureWarning) 2022-05-18T04:44:55.4887000Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:44:55.4887510Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:44:55.8579029Z ok (3.131s) 2022-05-18T04:44:55.8706755Z test_nested_all_wrapped_model_offload_false_none_none_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 41210 2022-05-18T04:44:55.8812534Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 41211 2022-05-18T04:44:56.7773946Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp_hfqxhb3 2022-05-18T04:44:56.7775104Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp_hfqxhb3/_remote_module_non_scriptable.py 2022-05-18T04:44:56.7817496Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpfez843vl 2022-05-18T04:44:56.7820146Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpfez843vl/_remote_module_non_scriptable.py 2022-05-18T04:44:56.7993740Z dist init r=0, world=2 2022-05-18T04:44:56.7997545Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:44:56.8047941Z dist init r=1, world=2 2022-05-18T04:44:56.8052403Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:44:56.8053677Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:44:56.8100951Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:44:58.1720243Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:44:58.1722004Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:44:58.4820096Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:44:58.4828776Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:44:58.4848906Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:44:58.4849573Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:44:58.4859725Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:44:58.4860380Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:44:58.5379061Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:44:58.5379735Z warnings.warn(msg, FutureWarning) 2022-05-18T04:44:58.5382777Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:44:58.5383458Z warnings.warn(msg, FutureWarning) 2022-05-18T04:44:58.5463734Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:44:58.5466020Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:44:58.8891446Z ok (3.031s) 2022-05-18T04:44:58.9017079Z test_nested_all_wrapped_model_offload_false_none_shard_grad_op_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 41297 2022-05-18T04:44:58.9121470Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 41298 2022-05-18T04:44:59.8137845Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpblt9vzg8 2022-05-18T04:44:59.8139596Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpblt9vzg8/_remote_module_non_scriptable.py 2022-05-18T04:44:59.8261422Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpauf1ckcs 2022-05-18T04:44:59.8264204Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpauf1ckcs/_remote_module_non_scriptable.py 2022-05-18T04:44:59.8359853Z dist init r=1, world=2 2022-05-18T04:44:59.8363996Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:44:59.8494531Z dist init r=0, world=2 2022-05-18T04:44:59.8498985Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:44:59.8500115Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:44:59.8569360Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:45:01.2328142Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:45:01.2328705Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:45:01.5537491Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:45:01.5538048Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:45:01.5567246Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:45:01.5567952Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:45:01.5568805Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:45:01.5569438Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:45:01.6084198Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:45:01.6084878Z warnings.warn(msg, FutureWarning) 2022-05-18T04:45:01.6086654Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:45:01.6087315Z warnings.warn(msg, FutureWarning) 2022-05-18T04:45:01.6172161Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:45:01.6172645Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:45:02.0203389Z ok (3.131s) 2022-05-18T04:45:02.0332784Z test_nested_all_wrapped_model_offload_false_none_shard_grad_op_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 41384 2022-05-18T04:45:02.0439168Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 41385 2022-05-18T04:45:02.9431177Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpz7dplhqp 2022-05-18T04:45:02.9432077Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpz7dplhqp/_remote_module_non_scriptable.py 2022-05-18T04:45:02.9482346Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpvt6h9cm7 2022-05-18T04:45:02.9485042Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpvt6h9cm7/_remote_module_non_scriptable.py 2022-05-18T04:45:02.9651094Z dist init r=0, world=2 2022-05-18T04:45:02.9655463Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:45:02.9711788Z dist init r=1, world=2 2022-05-18T04:45:02.9716174Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:45:02.9716955Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:45:02.9758998Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:45:04.3565438Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:45:04.3566018Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:45:04.6704495Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:45:04.6712609Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:45:04.6732904Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:45:04.6733566Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:45:04.6743517Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:45:04.6744176Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:45:04.7254674Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:45:04.7255358Z warnings.warn(msg, FutureWarning) 2022-05-18T04:45:04.7257482Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:45:04.7258166Z warnings.warn(msg, FutureWarning) 2022-05-18T04:45:04.7338714Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:45:04.7340935Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:45:05.0519395Z ok (3.031s) 2022-05-18T04:45:05.0646671Z test_nested_all_wrapped_model_offload_false_prefetch_post_no_shard_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 41471 2022-05-18T04:45:05.0752628Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 41472 2022-05-18T04:45:05.9763038Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpr_s_b0vs 2022-05-18T04:45:05.9763893Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpr_s_b0vs/_remote_module_non_scriptable.py 2022-05-18T04:45:05.9993014Z dist init r=1, world=2 2022-05-18T04:45:05.9997479Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:45:06.0181509Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp6lsxrrjm 2022-05-18T04:45:06.0184152Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp6lsxrrjm/_remote_module_non_scriptable.py 2022-05-18T04:45:06.0402280Z dist init r=0, world=2 2022-05-18T04:45:06.0406592Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:45:06.0407384Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:45:06.0408100Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:45:07.4163849Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:45:07.4164384Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:45:07.7309074Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:45:07.7318210Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:45:07.7338100Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:45:07.7339074Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:45:07.7351015Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:45:07.7351778Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:45:07.7725466Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:45:07.7726137Z warnings.warn(msg, FutureWarning) 2022-05-18T04:45:07.7741738Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:45:07.7742408Z warnings.warn(msg, FutureWarning) 2022-05-18T04:45:07.7825075Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:45:07.7826118Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:45:08.1836459Z ok (3.131s) 2022-05-18T04:45:08.1966637Z test_nested_all_wrapped_model_offload_false_prefetch_post_no_shard_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 41558 2022-05-18T04:45:08.2073702Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 41559 2022-05-18T04:45:09.1030999Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp0kq6t_er 2022-05-18T04:45:09.1031898Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp0kq6t_er/_remote_module_non_scriptable.py 2022-05-18T04:45:09.1261614Z dist init r=1, world=2 2022-05-18T04:45:09.1266139Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:45:09.1424909Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp_o90vx_p 2022-05-18T04:45:09.1427617Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp_o90vx_p/_remote_module_non_scriptable.py 2022-05-18T04:45:09.1645496Z dist init r=0, world=2 2022-05-18T04:45:09.1649943Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:45:09.1650728Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:45:09.1674553Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:45:10.5472456Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:45:10.5473011Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:45:10.8604256Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:45:10.8613619Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:45:10.8632784Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:45:10.8633459Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:45:10.8645095Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:45:10.8646114Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:45:10.9019440Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:45:10.9020156Z warnings.warn(msg, FutureWarning) 2022-05-18T04:45:10.9032405Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:45:10.9033083Z warnings.warn(msg, FutureWarning) 2022-05-18T04:45:10.9113492Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:45:10.9116147Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:45:11.3157899Z ok (3.132s) 2022-05-18T04:45:11.3289304Z test_nested_all_wrapped_model_offload_false_prefetch_post_none_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 41645 2022-05-18T04:45:11.3394672Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 41646 2022-05-18T04:45:12.2428693Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpms12d6_g 2022-05-18T04:45:12.2429572Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpms12d6_g/_remote_module_non_scriptable.py 2022-05-18T04:45:12.2661671Z dist init r=1, world=2 2022-05-18T04:45:12.2666290Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:45:12.2787101Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp9gck28ta 2022-05-18T04:45:12.2789785Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp9gck28ta/_remote_module_non_scriptable.py 2022-05-18T04:45:12.3008622Z dist init r=0, world=2 2022-05-18T04:45:12.3012810Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:45:12.3013724Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:45:12.3075037Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:45:13.6859042Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:45:13.6859573Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:45:14.0014455Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:45:14.0023469Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:45:14.0044433Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:45:14.0045204Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:45:14.0055803Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:45:14.0056451Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:45:14.0591061Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:45:14.0592065Z warnings.warn(msg, FutureWarning) 2022-05-18T04:45:14.0594595Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:45:14.0595298Z warnings.warn(msg, FutureWarning) 2022-05-18T04:45:14.0677706Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:45:14.0678210Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:45:14.4478994Z ok (3.132s) 2022-05-18T04:45:14.4611146Z test_nested_all_wrapped_model_offload_false_prefetch_post_none_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 41732 2022-05-18T04:45:14.4718951Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 41733 2022-05-18T04:45:15.4069421Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpm6yudadn 2022-05-18T04:45:15.4070609Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpm6yudadn/_remote_module_non_scriptable.py 2022-05-18T04:45:15.4099313Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpzjcng6mi 2022-05-18T04:45:15.4102424Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpzjcng6mi/_remote_module_non_scriptable.py 2022-05-18T04:45:15.4295933Z dist init r=0, world=2 2022-05-18T04:45:15.4300374Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:45:15.4333416Z dist init r=1, world=2 2022-05-18T04:45:15.4338095Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:45:15.4339197Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:45:15.4403951Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:45:16.8215255Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:45:16.8216233Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:45:17.1367697Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:45:17.1368276Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:45:17.1397451Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:45:17.1398149Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:45:17.1398992Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:45:17.1399645Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:45:17.1909764Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:45:17.1910448Z warnings.warn(msg, FutureWarning) 2022-05-18T04:45:17.1911943Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:45:17.1912647Z warnings.warn(msg, FutureWarning) 2022-05-18T04:45:17.1992681Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:45:17.1993205Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:45:17.5800869Z ok (3.132s) 2022-05-18T04:45:17.5928088Z test_nested_all_wrapped_model_offload_false_prefetch_post_shard_grad_op_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 41819 2022-05-18T04:45:17.6036230Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 41820 2022-05-18T04:45:18.4817646Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp3wh80q4m 2022-05-18T04:45:18.4818832Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp3wh80q4m/_remote_module_non_scriptable.py 2022-05-18T04:45:18.4977483Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp18z1ygu4 2022-05-18T04:45:18.4980008Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp18z1ygu4/_remote_module_non_scriptable.py 2022-05-18T04:45:18.5047705Z dist init r=0, world=2 2022-05-18T04:45:18.5052395Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:45:18.5197472Z dist init r=1, world=2 2022-05-18T04:45:18.5201690Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:45:18.5202721Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:45:18.5257930Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:45:19.9173900Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:45:19.9174411Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:45:20.2314246Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:45:20.2314802Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:45:20.2343741Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:45:20.2344390Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:45:20.2345259Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:45:20.2345902Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:45:20.2853056Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:45:20.2853859Z warnings.warn(msg, FutureWarning) 2022-05-18T04:45:20.2856766Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:45:20.2857736Z warnings.warn(msg, FutureWarning) 2022-05-18T04:45:20.2939841Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:45:20.2940345Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:45:20.6115842Z ok (3.031s) 2022-05-18T04:45:20.6242735Z test_nested_all_wrapped_model_offload_false_prefetch_post_shard_grad_op_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 41906 2022-05-18T04:45:20.6347621Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 41907 2022-05-18T04:45:21.5241506Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp8pz_yxak 2022-05-18T04:45:21.5242784Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp8pz_yxak/_remote_module_non_scriptable.py 2022-05-18T04:45:21.5470071Z dist init r=1, world=2 2022-05-18T04:45:21.5474646Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:45:21.5706282Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp9uqgy65n 2022-05-18T04:45:21.5708776Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp9uqgy65n/_remote_module_non_scriptable.py 2022-05-18T04:45:21.5925769Z dist init r=0, world=2 2022-05-18T04:45:21.5930035Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:45:21.5931057Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:45:21.5984781Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:45:22.9772405Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:45:22.9773007Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:45:23.2879277Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:45:23.2888323Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:45:23.2909508Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:45:23.2910196Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:45:23.2920266Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:45:23.2920916Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:45:23.3436569Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:45:23.3437297Z warnings.warn(msg, FutureWarning) 2022-05-18T04:45:23.3438935Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:45:23.3439609Z warnings.warn(msg, FutureWarning) 2022-05-18T04:45:23.3523042Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:45:23.3526109Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:45:23.7430321Z ok (3.131s) 2022-05-18T04:45:23.7561005Z test_nested_all_wrapped_model_offload_false_prefetch_pre_no_shard_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 41993 2022-05-18T04:45:23.7667981Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 41994 2022-05-18T04:45:24.6579013Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpys2d45fo 2022-05-18T04:45:24.6580299Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpys2d45fo/_remote_module_non_scriptable.py 2022-05-18T04:45:24.6678984Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp3abzpuua 2022-05-18T04:45:24.6681930Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp3abzpuua/_remote_module_non_scriptable.py 2022-05-18T04:45:24.6810375Z dist init r=1, world=2 2022-05-18T04:45:24.6815311Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:45:24.6901566Z dist init r=0, world=2 2022-05-18T04:45:24.6905758Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:45:24.6906889Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:45:24.6918631Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:45:26.0733295Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:45:26.0733820Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:45:26.3886258Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:45:26.3895245Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:45:26.3915793Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:45:26.3916466Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:45:26.3926741Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:45:26.3927394Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:45:26.4297077Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:45:26.4297773Z warnings.warn(msg, FutureWarning) 2022-05-18T04:45:26.4307580Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:45:26.4308250Z warnings.warn(msg, FutureWarning) 2022-05-18T04:45:26.4389768Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:45:26.4390406Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:45:26.7749070Z ok (3.032s) 2022-05-18T04:45:26.7876717Z test_nested_all_wrapped_model_offload_false_prefetch_pre_no_shard_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 42080 2022-05-18T04:45:26.7981638Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 42081 2022-05-18T04:45:27.6939139Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpay7cfbi5 2022-05-18T04:45:27.6940326Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpay7cfbi5/_remote_module_non_scriptable.py 2022-05-18T04:45:27.7030697Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpjtsbihlm 2022-05-18T04:45:27.7033490Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpjtsbihlm/_remote_module_non_scriptable.py 2022-05-18T04:45:27.7159382Z dist init r=0, world=2 2022-05-18T04:45:27.7163505Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:45:27.7262111Z dist init r=1, world=2 2022-05-18T04:45:27.7267009Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:45:27.7267817Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:45:27.7268518Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:45:29.1023807Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:45:29.1024366Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:45:29.4153024Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:45:29.4161843Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:45:29.4182043Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:45:29.4182730Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:45:29.4194382Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:45:29.4195031Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:45:29.4566640Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:45:29.4567339Z warnings.warn(msg, FutureWarning) 2022-05-18T04:45:29.4577317Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:45:29.4577989Z warnings.warn(msg, FutureWarning) 2022-05-18T04:45:29.4659285Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:45:29.4661910Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:45:29.8062363Z ok (3.031s) 2022-05-18T04:45:29.8192275Z test_nested_all_wrapped_model_offload_false_prefetch_pre_none_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 42167 2022-05-18T04:45:29.8301232Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 42168 2022-05-18T04:45:30.7262716Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmps0uxayah 2022-05-18T04:45:30.7263893Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmps0uxayah/_remote_module_non_scriptable.py 2022-05-18T04:45:30.7493444Z dist init r=1, world=2 2022-05-18T04:45:30.7498861Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:45:30.7770037Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpawjujpql 2022-05-18T04:45:30.7772905Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpawjujpql/_remote_module_non_scriptable.py 2022-05-18T04:45:30.7998779Z dist init r=0, world=2 2022-05-18T04:45:30.8003602Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:45:30.8005050Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:45:30.8008573Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:45:32.1602744Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:45:32.1603278Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:45:32.4693717Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:45:32.4703020Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:45:32.4723651Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:45:32.4724336Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:45:32.4735252Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:45:32.4735900Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:45:32.5269315Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:45:32.5270006Z warnings.warn(msg, FutureWarning) 2022-05-18T04:45:32.5273479Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:45:32.5274152Z warnings.warn(msg, FutureWarning) 2022-05-18T04:45:32.5355212Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:45:32.5356592Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:45:32.9389946Z ok (3.133s) 2022-05-18T04:45:32.9517676Z test_nested_all_wrapped_model_offload_false_prefetch_pre_none_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 42254 2022-05-18T04:45:32.9622537Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 42255 2022-05-18T04:45:33.8189093Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpa4hy60jc 2022-05-18T04:45:33.8190251Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpa4hy60jc/_remote_module_non_scriptable.py 2022-05-18T04:45:33.8412953Z dist init r=0, world=2 2022-05-18T04:45:33.8417513Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:45:33.8743397Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpa0ljbp_v 2022-05-18T04:45:33.8746249Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpa0ljbp_v/_remote_module_non_scriptable.py 2022-05-18T04:45:33.8973435Z dist init r=1, world=2 2022-05-18T04:45:33.8978293Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:45:33.8979260Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:45:33.9029576Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:45:35.2776277Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:45:35.2776816Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:45:35.5873854Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:45:35.5878949Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:45:35.5905093Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:45:35.5905789Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:45:35.5910135Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:45:35.5910783Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:45:35.6425514Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:45:35.6426231Z warnings.warn(msg, FutureWarning) 2022-05-18T04:45:35.6427175Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:45:35.6427839Z warnings.warn(msg, FutureWarning) 2022-05-18T04:45:35.6505780Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:45:35.6506291Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:45:35.9714042Z ok (3.032s) 2022-05-18T04:45:35.9841048Z test_nested_all_wrapped_model_offload_false_prefetch_pre_shard_grad_op_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 42341 2022-05-18T04:45:35.9945579Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 42342 2022-05-18T04:45:36.8933413Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmptyfe3qkc 2022-05-18T04:45:36.8934739Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmptyfe3qkc/_remote_module_non_scriptable.py 2022-05-18T04:45:36.8967085Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmprerbv3mk 2022-05-18T04:45:36.8970154Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmprerbv3mk/_remote_module_non_scriptable.py 2022-05-18T04:45:36.9155816Z dist init r=0, world=2 2022-05-18T04:45:36.9160258Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:45:36.9198968Z dist init r=1, world=2 2022-05-18T04:45:36.9203503Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:45:36.9204626Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:45:36.9263787Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:45:38.2896158Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:45:38.2896708Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:45:38.5976070Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:45:38.5976875Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:45:38.6005846Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:45:38.6006553Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:45:38.6007394Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:45:38.6008035Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:45:38.6508859Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:45:38.6509582Z warnings.warn(msg, FutureWarning) 2022-05-18T04:45:38.6511786Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:45:38.6512449Z warnings.warn(msg, FutureWarning) 2022-05-18T04:45:38.6594021Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:45:38.6594894Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:45:39.0023926Z ok (3.031s) 2022-05-18T04:45:39.0149898Z test_nested_all_wrapped_model_offload_false_prefetch_pre_shard_grad_op_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 42428 2022-05-18T04:45:39.0254648Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 42429 2022-05-18T04:45:39.9301373Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpt13z_ce8 2022-05-18T04:45:39.9302306Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpt13z_ce8/_remote_module_non_scriptable.py 2022-05-18T04:45:39.9305815Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp8nkd71o9 2022-05-18T04:45:39.9308952Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp8nkd71o9/_remote_module_non_scriptable.py 2022-05-18T04:45:39.9525808Z dist init r=1, world=2 2022-05-18T04:45:39.9530019Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:45:39.9541901Z dist init r=0, world=2 2022-05-18T04:45:39.9546612Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:45:39.9547800Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:45:39.9633489Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:45:41.3489880Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:45:41.3490632Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:45:41.6596795Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:45:41.6603878Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:45:41.6626667Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:45:41.6628017Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:45:41.6634742Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:45:41.6636037Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:45:41.7136993Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:45:41.7138429Z warnings.warn(msg, FutureWarning) 2022-05-18T04:45:41.7140373Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:45:41.7141630Z warnings.warn(msg, FutureWarning) 2022-05-18T04:45:41.7219748Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:45:41.7220739Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:45:42.0333067Z ok (3.031s) 2022-05-18T04:45:42.0458039Z test_nested_all_wrapped_model_offload_true_none_no_shard_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 42515 2022-05-18T04:45:42.0564402Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 42516 2022-05-18T04:45:42.9591828Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpwp95esp2 2022-05-18T04:45:42.9592699Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpwp95esp2/_remote_module_non_scriptable.py 2022-05-18T04:45:42.9605913Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp9u4vcu6x 2022-05-18T04:45:42.9608866Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp9u4vcu6x/_remote_module_non_scriptable.py 2022-05-18T04:45:42.9813258Z dist init r=0, world=2 2022-05-18T04:45:42.9817430Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:45:42.9836106Z dist init r=1, world=2 2022-05-18T04:45:42.9841251Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:45:42.9842048Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:45:42.9921088Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:45:44.3725161Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:45:44.3726047Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:45:44.6833250Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:45:44.6842622Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:45:44.6863982Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:45:44.6864687Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:45:44.6874313Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:45:44.6874968Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:45:44.6988265Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:45:44.6990244Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:45:44.7496044Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:45:44.7496763Z warnings.warn(msg, FutureWarning) 2022-05-18T04:45:44.7506431Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:45:44.7507121Z warnings.warn(msg, FutureWarning) 2022-05-18T04:45:44.7587983Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:45:44.7590120Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:45:45.1646573Z ok (3.131s) 2022-05-18T04:45:45.1774518Z test_nested_all_wrapped_model_offload_true_none_no_shard_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 42602 2022-05-18T04:45:45.1879499Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 42603 2022-05-18T04:45:46.0956805Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpp83lkn6f 2022-05-18T04:45:46.0958155Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpp83lkn6f/_remote_module_non_scriptable.py 2022-05-18T04:45:46.1188871Z dist init r=1, world=2 2022-05-18T04:45:46.1193316Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:45:46.1252408Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmphli6l129 2022-05-18T04:45:46.1255717Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmphli6l129/_remote_module_non_scriptable.py 2022-05-18T04:45:46.1475470Z dist init r=0, world=2 2022-05-18T04:45:46.1480097Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:45:46.1481192Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:45:46.1500135Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:45:47.5247572Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:45:47.5248442Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:45:47.8386138Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:45:47.8394823Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:45:47.8416802Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:45:47.8417470Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:45:47.8427217Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:45:47.8427853Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:45:47.8540094Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:45:47.8540578Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:45:47.9043183Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:45:47.9043874Z warnings.warn(msg, FutureWarning) 2022-05-18T04:45:47.9050549Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:45:47.9051960Z warnings.warn(msg, FutureWarning) 2022-05-18T04:45:47.9132196Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:45:47.9132971Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:45:48.2960151Z ok (3.131s) 2022-05-18T04:45:48.3089770Z test_nested_all_wrapped_model_offload_true_none_none_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 42689 2022-05-18T04:45:48.3195766Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 42690 2022-05-18T04:45:49.2219122Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpvhow2g1m 2022-05-18T04:45:49.2220004Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpvhow2g1m/_remote_module_non_scriptable.py 2022-05-18T04:45:49.2231482Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp9awm1oxf 2022-05-18T04:45:49.2234490Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp9awm1oxf/_remote_module_non_scriptable.py 2022-05-18T04:45:49.2443096Z dist init r=0, world=2 2022-05-18T04:45:49.2447524Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:45:49.2464518Z dist init r=1, world=2 2022-05-18T04:45:49.2469366Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:45:49.2470487Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:45:49.2551111Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:45:50.6252109Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:45:50.6252641Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:45:50.9344518Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:45:50.9352976Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:45:50.9374605Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:45:50.9375296Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:45:50.9385291Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:45:50.9385938Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:45:50.9503029Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:45:50.9504231Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:45:51.0136149Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:45:51.0136842Z warnings.warn(msg, FutureWarning) 2022-05-18T04:45:51.0139966Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:45:51.0140641Z warnings.warn(msg, FutureWarning) 2022-05-18T04:45:51.0221048Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:45:51.0223325Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:45:51.4277445Z ok (3.131s) 2022-05-18T04:45:51.4405739Z test_nested_all_wrapped_model_offload_true_none_none_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 42776 2022-05-18T04:45:51.4509442Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 42777 2022-05-18T04:45:52.3476987Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp29gt4k26 2022-05-18T04:45:52.3478412Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp29gt4k26/_remote_module_non_scriptable.py 2022-05-18T04:45:52.3526428Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp4em1me0v 2022-05-18T04:45:52.3529886Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp4em1me0v/_remote_module_non_scriptable.py 2022-05-18T04:45:52.3708593Z dist init r=1, world=2 2022-05-18T04:45:52.3713149Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:45:52.3750836Z dist init r=0, world=2 2022-05-18T04:45:52.3755366Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:45:52.3756250Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:45:52.3817112Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:45:53.7437222Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:45:53.7437802Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:45:54.0569438Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:45:54.0578283Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:45:54.0599371Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:45:54.0600055Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:45:54.0609709Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:45:54.0610528Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:45:54.0727660Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:45:54.0728370Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:45:54.1367525Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:45:54.1368347Z warnings.warn(msg, FutureWarning) 2022-05-18T04:45:54.1372200Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:45:54.1372868Z warnings.warn(msg, FutureWarning) 2022-05-18T04:45:54.1452910Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:45:54.1454804Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:45:54.5594088Z ok (3.131s) 2022-05-18T04:45:54.5726236Z test_nested_all_wrapped_model_offload_true_none_shard_grad_op_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 42863 2022-05-18T04:45:54.5837475Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 42864 2022-05-18T04:45:55.4991575Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpot3oh2po 2022-05-18T04:45:55.4992667Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpot3oh2po/_remote_module_non_scriptable.py 2022-05-18T04:45:55.5213853Z dist init r=0, world=2 2022-05-18T04:45:55.5217770Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:45:55.5276577Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp5ypyujyp 2022-05-18T04:45:55.5279340Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp5ypyujyp/_remote_module_non_scriptable.py 2022-05-18T04:45:55.5497631Z dist init r=1, world=2 2022-05-18T04:45:55.5501666Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:45:55.5502619Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:45:55.5524329Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:45:56.9077447Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:45:56.9077984Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:45:57.2179926Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:45:57.2180794Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:45:57.2208702Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:45:57.2209515Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:45:57.2210613Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:45:57.2211244Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:45:57.2325663Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:45:57.2326185Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:45:57.2932864Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:45:57.2935163Z warnings.warn(msg, FutureWarning) 2022-05-18T04:45:57.2936096Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:45:57.2936767Z warnings.warn(msg, FutureWarning) 2022-05-18T04:45:57.3016599Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:45:57.3017101Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:45:57.6914987Z ok (3.132s) 2022-05-18T04:45:57.7041463Z test_nested_all_wrapped_model_offload_true_none_shard_grad_op_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 42950 2022-05-18T04:45:57.7145803Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 42951 2022-05-18T04:45:58.6169246Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpt6u8i407 2022-05-18T04:45:58.6170463Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpt6u8i407/_remote_module_non_scriptable.py 2022-05-18T04:45:58.6193500Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpweftra6e 2022-05-18T04:45:58.6196546Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpweftra6e/_remote_module_non_scriptable.py 2022-05-18T04:45:58.6399878Z dist init r=1, world=2 2022-05-18T04:45:58.6404476Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:45:58.6419914Z dist init r=0, world=2 2022-05-18T04:45:58.6424369Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:45:58.6425230Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:45:58.6508293Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:46:00.0483535Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:46:00.0484079Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:46:00.3577645Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:00.3586373Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:00.3606364Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:46:00.3607261Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:46:00.3618246Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:46:00.3618892Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:46:00.3736059Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:00.3736853Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:00.4361672Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:46:00.4362371Z warnings.warn(msg, FutureWarning) 2022-05-18T04:46:00.4363302Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:46:00.4363934Z warnings.warn(msg, FutureWarning) 2022-05-18T04:46:00.4443473Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:00.4445337Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:00.8234451Z ok (3.132s) 2022-05-18T04:46:00.8360957Z test_nested_all_wrapped_model_offload_true_prefetch_post_no_shard_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 43037 2022-05-18T04:46:00.8464943Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 43038 2022-05-18T04:46:01.7461543Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmplimxj2j9 2022-05-18T04:46:01.7462411Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmplimxj2j9/_remote_module_non_scriptable.py 2022-05-18T04:46:01.7485880Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp8092xx7v 2022-05-18T04:46:01.7488824Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp8092xx7v/_remote_module_non_scriptable.py 2022-05-18T04:46:01.7682198Z dist init r=0, world=2 2022-05-18T04:46:01.7686522Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:46:01.7719297Z dist init r=1, world=2 2022-05-18T04:46:01.7724090Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:46:01.7725210Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:46:01.7789953Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:46:03.1644843Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:46:03.1645387Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:46:03.4740901Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:03.4748692Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:03.4770157Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:46:03.4771046Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:46:03.4780103Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:46:03.4780755Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:46:03.4889346Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:03.4889847Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:03.5381827Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:46:03.5382514Z warnings.warn(msg, FutureWarning) 2022-05-18T04:46:03.5383799Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:46:03.5384476Z warnings.warn(msg, FutureWarning) 2022-05-18T04:46:03.5462706Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:03.5463449Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:03.8546094Z ok (3.031s) 2022-05-18T04:46:03.8675706Z test_nested_all_wrapped_model_offload_true_prefetch_post_no_shard_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 43124 2022-05-18T04:46:03.8781656Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 43125 2022-05-18T04:46:04.7739191Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp8ip9lkg3 2022-05-18T04:46:04.7740063Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp8ip9lkg3/_remote_module_non_scriptable.py 2022-05-18T04:46:04.7968592Z dist init r=1, world=2 2022-05-18T04:46:04.7973355Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:46:04.8225070Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpitakc5rx 2022-05-18T04:46:04.8227887Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpitakc5rx/_remote_module_non_scriptable.py 2022-05-18T04:46:04.8447910Z dist init r=0, world=2 2022-05-18T04:46:04.8452311Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:46:04.8453502Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:46:04.8483220Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:46:06.2326947Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:46:06.2327468Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:46:06.5401424Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:06.5410099Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:06.5431346Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:46:06.5432022Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:46:06.5441450Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:46:06.5442119Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:46:06.5550157Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:06.5550669Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:06.6047654Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:46:06.6048340Z warnings.warn(msg, FutureWarning) 2022-05-18T04:46:06.6051128Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:46:06.6051791Z warnings.warn(msg, FutureWarning) 2022-05-18T04:46:06.6130538Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:06.6131225Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:06.9863933Z ok (3.132s) 2022-05-18T04:46:06.9994848Z test_nested_all_wrapped_model_offload_true_prefetch_post_none_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 43211 2022-05-18T04:46:07.0105155Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 43212 2022-05-18T04:46:07.9642265Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpllgr4bh6 2022-05-18T04:46:07.9643348Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpllgr4bh6/_remote_module_non_scriptable.py 2022-05-18T04:46:07.9692562Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpjt_4ibtb 2022-05-18T04:46:07.9695836Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpjt_4ibtb/_remote_module_non_scriptable.py 2022-05-18T04:46:07.9865833Z dist init r=0, world=2 2022-05-18T04:46:07.9869932Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:46:07.9924746Z dist init r=1, world=2 2022-05-18T04:46:07.9929279Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:46:07.9930318Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:46:07.9973542Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:46:09.3673136Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:46:09.3673650Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:46:09.6771134Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:09.6771677Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:09.6800379Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:46:09.6801354Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:46:09.6802297Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:46:09.6802946Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:46:09.6915233Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:09.6915742Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:09.7541930Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:46:09.7542952Z warnings.warn(msg, FutureWarning) 2022-05-18T04:46:09.7544121Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:46:09.7544774Z warnings.warn(msg, FutureWarning) 2022-05-18T04:46:09.7622078Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:09.7622584Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:10.1187280Z ok (3.132s) 2022-05-18T04:46:10.1315148Z test_nested_all_wrapped_model_offload_true_prefetch_post_none_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 43298 2022-05-18T04:46:10.1420862Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 43299 2022-05-18T04:46:11.0433015Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpmyznocf0 2022-05-18T04:46:11.0433875Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpmyznocf0/_remote_module_non_scriptable.py 2022-05-18T04:46:11.0554755Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp7lgjj826 2022-05-18T04:46:11.0557637Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp7lgjj826/_remote_module_non_scriptable.py 2022-05-18T04:46:11.0655694Z dist init r=0, world=2 2022-05-18T04:46:11.0659884Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:46:11.0787980Z dist init r=1, world=2 2022-05-18T04:46:11.0792663Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:46:11.0793645Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:46:11.0865352Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:46:12.4567605Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:46:12.4568171Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:46:12.7698424Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:12.7707511Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:12.7727889Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:46:12.7728853Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:46:12.7739188Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:46:12.7739839Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:46:12.7856823Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:12.7859068Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:12.8500523Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:46:12.8501228Z warnings.warn(msg, FutureWarning) 2022-05-18T04:46:12.8504544Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:46:12.8505217Z warnings.warn(msg, FutureWarning) 2022-05-18T04:46:12.8585135Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:12.8587469Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:13.2503098Z ok (3.131s) 2022-05-18T04:46:13.2632299Z test_nested_all_wrapped_model_offload_true_prefetch_post_shard_grad_op_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 43385 2022-05-18T04:46:13.2738730Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 43386 2022-05-18T04:46:14.1770513Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp4jv5fe99 2022-05-18T04:46:14.1771573Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp4jv5fe99/_remote_module_non_scriptable.py 2022-05-18T04:46:14.1999193Z dist init r=1, world=2 2022-05-18T04:46:14.2003561Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:46:14.2211923Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpey_34fr2 2022-05-18T04:46:14.2214946Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpey_34fr2/_remote_module_non_scriptable.py 2022-05-18T04:46:14.2437718Z dist init r=0, world=2 2022-05-18T04:46:14.2442554Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:46:14.2443494Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:46:14.2513890Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:46:15.6344692Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:46:15.6345231Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:46:15.9457769Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:15.9458324Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:15.9486659Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:46:15.9487621Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:46:15.9488579Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:46:15.9489235Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:46:15.9603971Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:15.9604450Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:16.0216110Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:46:16.0217020Z warnings.warn(msg, FutureWarning) 2022-05-18T04:46:16.0218460Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:46:16.0219119Z warnings.warn(msg, FutureWarning) 2022-05-18T04:46:16.0298926Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:16.0299433Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:16.3820551Z ok (3.132s) 2022-05-18T04:46:16.3949377Z test_nested_all_wrapped_model_offload_true_prefetch_post_shard_grad_op_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 43472 2022-05-18T04:46:16.4055611Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 43473 2022-05-18T04:46:17.3034736Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpjud6vfrl 2022-05-18T04:46:17.3035651Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpjud6vfrl/_remote_module_non_scriptable.py 2022-05-18T04:46:17.3059244Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpyf5ytua1 2022-05-18T04:46:17.3062114Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpyf5ytua1/_remote_module_non_scriptable.py 2022-05-18T04:46:17.3266299Z dist init r=0, world=2 2022-05-18T04:46:17.3270655Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:46:17.3285435Z dist init r=1, world=2 2022-05-18T04:46:17.3289860Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:46:17.3290995Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:46:17.3374518Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:46:18.7162752Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:46:18.7163310Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:46:19.0268083Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:19.0268651Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:19.0298997Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:46:19.0300084Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:46:19.0300944Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:46:19.0301682Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:46:19.0419463Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:19.0419971Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:19.1050222Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:46:19.1051112Z warnings.warn(msg, FutureWarning) 2022-05-18T04:46:19.1053855Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:46:19.1054531Z warnings.warn(msg, FutureWarning) 2022-05-18T04:46:19.1136351Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:19.1136834Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:19.5137725Z ok (3.132s) 2022-05-18T04:46:19.5265687Z test_nested_all_wrapped_model_offload_true_prefetch_pre_no_shard_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 43559 2022-05-18T04:46:19.5373518Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 43560 2022-05-18T04:46:20.4627829Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpouaelk8a 2022-05-18T04:46:20.4628763Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpouaelk8a/_remote_module_non_scriptable.py 2022-05-18T04:46:20.4755909Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp_6upe3m0 2022-05-18T04:46:20.4758720Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp_6upe3m0/_remote_module_non_scriptable.py 2022-05-18T04:46:20.4851245Z dist init r=1, world=2 2022-05-18T04:46:20.4855847Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:46:20.4984301Z dist init r=0, world=2 2022-05-18T04:46:20.4988671Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:46:20.4989674Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:46:20.5061107Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:46:21.8790448Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:46:21.8790991Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:46:22.1887679Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:22.1888208Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:22.1918087Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:46:22.1919061Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:46:22.1919916Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:46:22.1920656Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:46:22.2031489Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:22.2032001Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:22.2541559Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:46:22.2542251Z warnings.warn(msg, FutureWarning) 2022-05-18T04:46:22.2551474Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:46:22.2552130Z warnings.warn(msg, FutureWarning) 2022-05-18T04:46:22.2633847Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:22.2634353Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:22.6466677Z ok (3.133s) 2022-05-18T04:46:22.6599799Z test_nested_all_wrapped_model_offload_true_prefetch_pre_no_shard_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 43646 2022-05-18T04:46:22.6708771Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 43647 2022-05-18T04:46:23.5693128Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpb66ikzyu 2022-05-18T04:46:23.5694410Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpb66ikzyu/_remote_module_non_scriptable.py 2022-05-18T04:46:23.5922313Z dist init r=0, world=2 2022-05-18T04:46:23.5926765Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:46:23.6110535Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpv7tdl6i0 2022-05-18T04:46:23.6113576Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpv7tdl6i0/_remote_module_non_scriptable.py 2022-05-18T04:46:23.6336418Z dist init r=1, world=2 2022-05-18T04:46:23.6340925Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:46:23.6341735Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:46:23.6437769Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:46:25.0340955Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:46:25.0341522Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:46:25.3408519Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:25.3414730Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:25.3438427Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:46:25.3439097Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:46:25.3445172Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:46:25.3445824Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:46:25.3557200Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:25.3557722Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:25.4062588Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:46:25.4063317Z warnings.warn(msg, FutureWarning) 2022-05-18T04:46:25.4064262Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:46:25.4064928Z warnings.warn(msg, FutureWarning) 2022-05-18T04:46:25.4144241Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:25.4144722Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:25.7802408Z ok (3.133s) 2022-05-18T04:46:25.7930479Z test_nested_all_wrapped_model_offload_true_prefetch_pre_none_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 43733 2022-05-18T04:46:25.8035443Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 43734 2022-05-18T04:46:26.7459017Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpc1hg_egv 2022-05-18T04:46:26.7459937Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpc1hg_egv/_remote_module_non_scriptable.py 2022-05-18T04:46:26.7681950Z dist init r=1, world=2 2022-05-18T04:46:26.7686585Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:46:26.7720163Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp329qv71h 2022-05-18T04:46:26.7723054Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp329qv71h/_remote_module_non_scriptable.py 2022-05-18T04:46:26.7949108Z dist init r=0, world=2 2022-05-18T04:46:26.7954171Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:46:26.7955003Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:46:26.7993096Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:46:28.1760416Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:46:28.1760969Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:46:28.4853296Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:28.4853884Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:28.4882989Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:46:28.4883656Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:46:28.4884797Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:46:28.4885441Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:46:28.5000942Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:28.5001474Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:28.5638659Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:46:28.5639336Z warnings.warn(msg, FutureWarning) 2022-05-18T04:46:28.5643137Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:46:28.5643800Z warnings.warn(msg, FutureWarning) 2022-05-18T04:46:28.5723592Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:28.5724083Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:28.9129328Z ok (3.133s) 2022-05-18T04:46:28.9257534Z test_nested_all_wrapped_model_offload_true_prefetch_pre_none_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 43820 2022-05-18T04:46:28.9363030Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 43821 2022-05-18T04:46:29.8374599Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpis5i5lov 2022-05-18T04:46:29.8375740Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpu5mbxhh8 2022-05-18T04:46:29.8376825Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpis5i5lov/_remote_module_non_scriptable.py 2022-05-18T04:46:29.8377966Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpu5mbxhh8/_remote_module_non_scriptable.py 2022-05-18T04:46:29.8596719Z dist init r=0, world=2 2022-05-18T04:46:29.8601131Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:46:29.8607764Z dist init r=1, world=2 2022-05-18T04:46:29.8612940Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:46:29.8614370Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:46:29.8704977Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:46:31.2301512Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:46:31.2302508Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:46:31.5393074Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:31.5402369Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:31.5423492Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:46:31.5424787Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:46:31.5436853Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:46:31.5438586Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:46:31.5556032Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:31.5557263Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:31.6204929Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:46:31.6206347Z warnings.warn(msg, FutureWarning) 2022-05-18T04:46:31.6208615Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:46:31.6209991Z warnings.warn(msg, FutureWarning) 2022-05-18T04:46:31.6292032Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:31.6293348Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:32.0445580Z ok (3.131s) 2022-05-18T04:46:32.0576887Z test_nested_all_wrapped_model_offload_true_prefetch_pre_shard_grad_op_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 43907 2022-05-18T04:46:32.0686176Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 43908 2022-05-18T04:46:32.9716539Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpqbdlfdnp 2022-05-18T04:46:32.9717795Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpqbdlfdnp/_remote_module_non_scriptable.py 2022-05-18T04:46:32.9949327Z dist init r=0, world=2 2022-05-18T04:46:32.9954163Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:46:33.0086315Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpq9rklgyb 2022-05-18T04:46:33.0089242Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpq9rklgyb/_remote_module_non_scriptable.py 2022-05-18T04:46:33.0311782Z dist init r=1, world=2 2022-05-18T04:46:33.0316844Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:46:33.0318301Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:46:33.0363955Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:46:34.4142843Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:46:34.4143822Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:46:34.7274354Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:34.7275045Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:34.7303859Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:46:34.7305170Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:46:34.7306815Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:46:34.7308510Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:46:34.7421417Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:34.7422639Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:34.8036104Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:46:34.8037533Z warnings.warn(msg, FutureWarning) 2022-05-18T04:46:34.8039471Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:46:34.8040713Z warnings.warn(msg, FutureWarning) 2022-05-18T04:46:34.8119019Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:34.8120030Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:35.1768873Z ok (3.132s) 2022-05-18T04:46:35.1900972Z test_nested_all_wrapped_model_offload_true_prefetch_pre_shard_grad_op_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 43994 2022-05-18T04:46:35.2009338Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 43995 2022-05-18T04:46:36.1043353Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpjbwgm60l 2022-05-18T04:46:36.1044581Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpjbwgm60l/_remote_module_non_scriptable.py 2022-05-18T04:46:36.1171562Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp980q1n4e 2022-05-18T04:46:36.1175021Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp980q1n4e/_remote_module_non_scriptable.py 2022-05-18T04:46:36.1264012Z dist init r=1, world=2 2022-05-18T04:46:36.1268234Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:46:36.1409241Z dist init r=0, world=2 2022-05-18T04:46:36.1414742Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:46:36.1416177Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:46:36.1474054Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:46:37.5218772Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:46:37.5219758Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:46:37.8399590Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:37.8400601Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:37.8429303Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:46:37.8430664Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:46:37.8432353Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:46:37.8433982Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:46:37.8552820Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:37.8554081Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:37.9186828Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:46:37.9188191Z warnings.warn(msg, FutureWarning) 2022-05-18T04:46:37.9191235Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:46:37.9192671Z warnings.warn(msg, FutureWarning) 2022-05-18T04:46:37.9276176Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:37.9277168Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:38.3093877Z ok (3.132s) 2022-05-18T04:46:38.3221557Z test_nested_wrapped_model_offload_false_none_no_shard (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 44081 2022-05-18T04:46:38.3326964Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 44082 2022-05-18T04:46:39.2572605Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpdgg3870y 2022-05-18T04:46:39.2573540Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpdgg3870y/_remote_module_non_scriptable.py 2022-05-18T04:46:39.2693351Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpy3l5fsgv 2022-05-18T04:46:39.2696010Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpy3l5fsgv/_remote_module_non_scriptable.py 2022-05-18T04:46:39.2796410Z dist init r=1, world=2 2022-05-18T04:46:39.2800425Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:46:39.2916697Z dist init r=0, world=2 2022-05-18T04:46:39.2920673Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:46:39.2921483Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:46:39.3005819Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:46:40.6755697Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:46:40.6756261Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:46:40.9861098Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:40.9861657Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:40.9893283Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:46:40.9893948Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:46:40.9894772Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:46:40.9895735Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:46:41.0263616Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:46:41.0264341Z warnings.warn(msg, FutureWarning) 2022-05-18T04:46:41.0265258Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:46:41.0265905Z warnings.warn(msg, FutureWarning) 2022-05-18T04:46:41.0356310Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:41.0356825Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:41.3407109Z ok (3.031s) 2022-05-18T04:46:41.3532147Z test_nested_wrapped_model_offload_false_none_none (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 44168 2022-05-18T04:46:41.3636848Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 44169 2022-05-18T04:46:42.2752065Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpsjid42ho 2022-05-18T04:46:42.2753031Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpsjid42ho/_remote_module_non_scriptable.py 2022-05-18T04:46:42.2983033Z dist init r=0, world=2 2022-05-18T04:46:42.2987459Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:46:42.3050349Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpfg2jmjo8 2022-05-18T04:46:42.3053878Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpfg2jmjo8/_remote_module_non_scriptable.py 2022-05-18T04:46:42.3273900Z dist init r=1, world=2 2022-05-18T04:46:42.3278324Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:46:42.3279436Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:46:42.3294286Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:46:43.7308981Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:46:43.7309569Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:46:44.0467308Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:44.0467885Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:44.0501561Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:46:44.0502235Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:46:44.0503055Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:46:44.0503689Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:46:44.1035207Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:46:44.1036159Z warnings.warn(msg, FutureWarning) 2022-05-18T04:46:44.1037994Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:46:44.1038654Z warnings.warn(msg, FutureWarning) 2022-05-18T04:46:44.1137175Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:44.1137675Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:44.4717529Z ok (3.131s) 2022-05-18T04:46:44.4844652Z test_nested_wrapped_model_offload_false_none_shard_grad_op (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 44255 2022-05-18T04:46:44.4954525Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 44256 2022-05-18T04:46:45.3951853Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmph0e3jb6f 2022-05-18T04:46:45.3952758Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmph0e3jb6f/_remote_module_non_scriptable.py 2022-05-18T04:46:45.4103703Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpivyl9rto 2022-05-18T04:46:45.4106363Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpivyl9rto/_remote_module_non_scriptable.py 2022-05-18T04:46:45.4169040Z dist init r=1, world=2 2022-05-18T04:46:45.4173474Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:46:45.4325262Z dist init r=0, world=2 2022-05-18T04:46:45.4329274Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:46:45.4330359Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:46:45.4378713Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:46:46.7900237Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:46:46.7900806Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:46:47.0995623Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:47.1003306Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:47.1029904Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:46:47.1031235Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:46:47.1038141Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:46:47.1039461Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:46:47.1533199Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:46:47.1534586Z warnings.warn(msg, FutureWarning) 2022-05-18T04:46:47.1536542Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:46:47.1538277Z warnings.warn(msg, FutureWarning) 2022-05-18T04:46:47.1632116Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:47.1633282Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:47.5032513Z ok (3.031s) 2022-05-18T04:46:47.5158626Z test_nested_wrapped_model_offload_false_prefetch_post_no_shard (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 44342 2022-05-18T04:46:47.5263904Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 44343 2022-05-18T04:46:48.4316074Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpzuc82osh 2022-05-18T04:46:48.4317576Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpzuc82osh/_remote_module_non_scriptable.py 2022-05-18T04:46:48.4546524Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp5_whglyp 2022-05-18T04:46:48.4547468Z dist init r=0, world=2 2022-05-18T04:46:48.4548756Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp5_whglyp/_remote_module_non_scriptable.py 2022-05-18T04:46:48.4552643Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:46:48.4769348Z dist init r=1, world=2 2022-05-18T04:46:48.4773802Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:46:48.4774868Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:46:48.4859751Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:46:49.8526379Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:46:49.8526920Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:46:50.1626405Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:50.1626967Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:50.1658702Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:46:50.1659374Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:46:50.1660409Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:46:50.1661063Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:46:50.2032254Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:46:50.2032946Z warnings.warn(msg, FutureWarning) 2022-05-18T04:46:50.2043869Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:46:50.2044550Z warnings.warn(msg, FutureWarning) 2022-05-18T04:46:50.2138328Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:50.2140569Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:50.5344213Z ok (3.031s) 2022-05-18T04:46:50.5469170Z test_nested_wrapped_model_offload_false_prefetch_post_none (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 44429 2022-05-18T04:46:50.5574238Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 44430 2022-05-18T04:46:51.4539319Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpcqot0ks5 2022-05-18T04:46:51.4540996Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpcqot0ks5/_remote_module_non_scriptable.py 2022-05-18T04:46:51.4769470Z dist init r=0, world=2 2022-05-18T04:46:51.4772602Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpmbuauvf4 2022-05-18T04:46:51.4774527Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:46:51.4775529Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpmbuauvf4/_remote_module_non_scriptable.py 2022-05-18T04:46:51.4994198Z dist init r=1, world=2 2022-05-18T04:46:51.4998097Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:46:51.4999097Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:46:51.5081433Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:46:52.8925393Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:46:52.8925921Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:46:53.2055922Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:53.2056714Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:53.2089210Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:46:53.2089912Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:46:53.2091063Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:46:53.2091713Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:46:53.2612457Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:46:53.2613143Z warnings.warn(msg, FutureWarning) 2022-05-18T04:46:53.2615711Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:46:53.2616362Z warnings.warn(msg, FutureWarning) 2022-05-18T04:46:53.2711584Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:53.2712309Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:53.6656942Z ok (3.131s) 2022-05-18T04:46:53.6785126Z test_nested_wrapped_model_offload_false_prefetch_post_shard_grad_op (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 44516 2022-05-18T04:46:53.6892479Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 44517 2022-05-18T04:46:54.5495708Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp8my516tm 2022-05-18T04:46:54.5497071Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp8my516tm/_remote_module_non_scriptable.py 2022-05-18T04:46:54.5501214Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmph55aty9z 2022-05-18T04:46:54.5504272Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmph55aty9z/_remote_module_non_scriptable.py 2022-05-18T04:46:54.5718812Z dist init r=1, world=2 2022-05-18T04:46:54.5723859Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:46:54.5726452Z dist init r=0, world=2 2022-05-18T04:46:54.5732126Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:46:54.5733566Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:46:54.5827458Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:46:55.9396137Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:46:55.9397102Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:46:56.2511557Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:56.2512597Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:56.2545657Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:46:56.2546991Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:46:56.2548636Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:46:56.2549896Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:46:56.3051067Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:46:56.3052482Z warnings.warn(msg, FutureWarning) 2022-05-18T04:46:56.3054465Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:46:56.3055822Z warnings.warn(msg, FutureWarning) 2022-05-18T04:46:56.3147492Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:56.3148478Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:56.6983905Z ok (3.033s) 2022-05-18T04:46:56.7109125Z test_nested_wrapped_model_offload_false_prefetch_pre_no_shard (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 44603 2022-05-18T04:46:56.7216298Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 44604 2022-05-18T04:46:57.6280358Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpjk9kt2bj 2022-05-18T04:46:57.6281449Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpjk9kt2bj/_remote_module_non_scriptable.py 2022-05-18T04:46:57.6294902Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpxtcnvm1e 2022-05-18T04:46:57.6298225Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpxtcnvm1e/_remote_module_non_scriptable.py 2022-05-18T04:46:57.6502355Z dist init r=1, world=2 2022-05-18T04:46:57.6506302Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:46:57.6528803Z dist init r=0, world=2 2022-05-18T04:46:57.6533687Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:46:57.6534814Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:46:57.6610135Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:46:59.0421157Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:46:59.0421708Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:46:59.3523132Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:59.3523665Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:59.3556092Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:46:59.3556773Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:46:59.3557621Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:46:59.3558273Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:46:59.3940132Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:46:59.3940833Z warnings.warn(msg, FutureWarning) 2022-05-18T04:46:59.3948771Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:46:59.3949453Z warnings.warn(msg, FutureWarning) 2022-05-18T04:46:59.4048318Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:59.4048803Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:46:59.7305572Z ok (3.032s) 2022-05-18T04:46:59.7429701Z test_nested_wrapped_model_offload_false_prefetch_pre_none (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 44690 2022-05-18T04:46:59.7533697Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 44691 2022-05-18T04:47:00.6487000Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpwojlx2vs 2022-05-18T04:47:00.6488012Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpwojlx2vs/_remote_module_non_scriptable.py 2022-05-18T04:47:00.6586654Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpbps3p96b 2022-05-18T04:47:00.6589369Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpbps3p96b/_remote_module_non_scriptable.py 2022-05-18T04:47:00.6719084Z dist init r=1, world=2 2022-05-18T04:47:00.6723790Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:47:00.6811376Z dist init r=0, world=2 2022-05-18T04:47:00.6816322Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:47:00.6817157Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:47:00.6827013Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:47:02.0668545Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:47:02.0669090Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:47:02.3791824Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:47:02.3801874Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:47:02.3824613Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:47:02.3825288Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:47:02.3838367Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:47:02.3839018Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:47:02.4376446Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:47:02.4377161Z warnings.warn(msg, FutureWarning) 2022-05-18T04:47:02.4380520Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:47:02.4381186Z warnings.warn(msg, FutureWarning) 2022-05-18T04:47:02.4477318Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:47:02.4478468Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:47:02.8618935Z ok (3.131s) 2022-05-18T04:47:02.8747842Z test_nested_wrapped_model_offload_false_prefetch_pre_shard_grad_op (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 44777 2022-05-18T04:47:02.8850587Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 44778 2022-05-18T04:47:03.8245811Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp_v7qjvyc 2022-05-18T04:47:03.8246839Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp_v7qjvyc/_remote_module_non_scriptable.py 2022-05-18T04:47:03.8254556Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpjbwsn3lz 2022-05-18T04:47:03.8257259Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpjbwsn3lz/_remote_module_non_scriptable.py 2022-05-18T04:47:03.8466215Z dist init r=0, world=2 2022-05-18T04:47:03.8470334Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:47:03.8475861Z dist init r=1, world=2 2022-05-18T04:47:03.8480299Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:47:03.8481370Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:47:03.8574437Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:47:05.2194255Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:47:05.2194819Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:47:05.5264609Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:47:05.5265173Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:47:05.5297498Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:47:05.5298177Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:47:05.5299027Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:47:05.5299665Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:47:05.5798211Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:47:05.5798904Z warnings.warn(msg, FutureWarning) 2022-05-18T04:47:05.5799814Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:47:05.5800458Z warnings.warn(msg, FutureWarning) 2022-05-18T04:47:05.5891059Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:47:05.5891575Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:47:05.9936791Z ok (3.132s) 2022-05-18T04:47:06.0064975Z test_nested_wrapped_model_offload_true_none_no_shard (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 44864 2022-05-18T04:47:06.0170952Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 44865 2022-05-18T04:47:06.9199269Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpp6lzjpp_ 2022-05-18T04:47:06.9200148Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpp6lzjpp_/_remote_module_non_scriptable.py 2022-05-18T04:47:06.9243360Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpo02ini3o 2022-05-18T04:47:06.9246228Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpo02ini3o/_remote_module_non_scriptable.py 2022-05-18T04:47:06.9421249Z dist init r=0, world=2 2022-05-18T04:47:06.9425415Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:47:06.9474977Z dist init r=1, world=2 2022-05-18T04:47:06.9479555Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:47:06.9480710Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:47:06.9529417Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:47:08.3379937Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:47:08.3380781Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:47:08.6500993Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:47:08.6510538Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:47:08.6533043Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:47:08.6533717Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:47:08.6547104Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:47:08.6547753Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:47:08.6662157Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:47:08.6664674Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:47:08.6684090Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:47:08.6685400Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:47:08.6686687Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:47:08.6687953Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:47:08.6689373Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:47:08.6690957Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:47:08.6692246Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:47:08.6693796Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:47:08.7210453Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:47:08.7211359Z warnings.warn(msg, FutureWarning) 2022-05-18T04:47:08.7219399Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:47:08.7220067Z warnings.warn(msg, FutureWarning) 2022-05-18T04:47:08.7314107Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:47:08.7316782Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:47:09.1255054Z ok (3.132s) 2022-05-18T04:47:09.1383814Z test_nested_wrapped_model_offload_true_none_none (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 44951 2022-05-18T04:47:09.1493946Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 44952 2022-05-18T04:47:10.0508750Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpm9qap1pm 2022-05-18T04:47:10.0509597Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpm9qap1pm/_remote_module_non_scriptable.py 2022-05-18T04:47:10.0642007Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpkwyf96c2 2022-05-18T04:47:10.0645010Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpkwyf96c2/_remote_module_non_scriptable.py 2022-05-18T04:47:10.0729435Z dist init r=1, world=2 2022-05-18T04:47:10.0733896Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:47:10.0876099Z dist init r=0, world=2 2022-05-18T04:47:10.0880898Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:47:10.0881663Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:47:10.0939365Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:47:11.4708065Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:47:11.4708625Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:47:11.7801626Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:47:11.7802148Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:47:11.7834545Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:47:11.7835320Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:47:11.7836187Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:47:11.7837165Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:47:11.7949153Z /opt/conda/lib/python3.7/site-packages/torch/optim/sgd.py:230: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:205.) 2022-05-18T04:47:11.7949971Z buf = torch.clone(d_p).detach() 2022-05-18T04:47:11.7951103Z /opt/conda/lib/python3.7/site-packages/torch/optim/sgd.py:230: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:205.) 2022-05-18T04:47:11.7951881Z buf = torch.clone(d_p).detach() 2022-05-18T04:47:11.7958445Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:47:11.7958954Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:47:11.8595840Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:47:11.8596531Z warnings.warn(msg, FutureWarning) 2022-05-18T04:47:11.8598112Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:47:11.8598775Z warnings.warn(msg, FutureWarning) 2022-05-18T04:47:11.8690471Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:47:11.8690959Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:47:12.2573836Z ok (3.132s) 2022-05-18T04:47:12.2700359Z test_nested_wrapped_model_offload_true_none_shard_grad_op (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 45038 2022-05-18T04:47:12.2804594Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 45039 2022-05-18T04:47:13.1792090Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp4k7_a_2t 2022-05-18T04:47:13.1793272Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp4k7_a_2t/_remote_module_non_scriptable.py 2022-05-18T04:47:13.2021791Z dist init r=1, world=2 2022-05-18T04:47:13.2026217Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:47:13.2209961Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpm1vtnxgu 2022-05-18T04:47:13.2212537Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpm1vtnxgu/_remote_module_non_scriptable.py 2022-05-18T04:47:13.2432027Z dist init r=0, world=2 2022-05-18T04:47:13.2436519Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:47:13.2437320Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:47:13.2536742Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:47:14.6073390Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:47:14.6073942Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:47:14.9160185Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:47:14.9170269Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:47:14.9194115Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:47:14.9194896Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:47:14.9205811Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:47:14.9206436Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:47:14.9324018Z /opt/conda/lib/python3.7/site-packages/torch/optim/sgd.py:230: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:205.) 2022-05-18T04:47:14.9324825Z buf = torch.clone(d_p).detach() 2022-05-18T04:47:14.9325965Z /opt/conda/lib/python3.7/site-packages/torch/optim/sgd.py:230: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:205.) 2022-05-18T04:47:14.9326742Z buf = torch.clone(d_p).detach() 2022-05-18T04:47:14.9332551Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:47:14.9335257Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:47:14.9977938Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:47:14.9978613Z warnings.warn(msg, FutureWarning) 2022-05-18T04:47:14.9980587Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:47:14.9981263Z warnings.warn(msg, FutureWarning) 2022-05-18T04:47:15.0075482Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:47:15.0077123Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:47:15.3886655Z ok (3.131s) 2022-05-18T04:47:15.4014188Z test_nested_wrapped_model_offload_true_prefetch_post_no_shard (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 45125 2022-05-18T04:47:15.4120118Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 45126 2022-05-18T04:47:16.3071760Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpiqd6bzu7 2022-05-18T04:47:16.3072662Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpiqd6bzu7/_remote_module_non_scriptable.py 2022-05-18T04:47:16.3121799Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpkl6r0j6e 2022-05-18T04:47:16.3124918Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpkl6r0j6e/_remote_module_non_scriptable.py 2022-05-18T04:47:16.3292747Z dist init r=1, world=2 2022-05-18T04:47:16.3297183Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:47:16.3356309Z dist init r=0, world=2 2022-05-18T04:47:16.3361049Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:47:16.3362182Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:47:16.3400561Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:47:17.7087944Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:47:17.7088477Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:47:18.0267028Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:47:18.0267575Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:47:18.0307599Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:47:18.0308271Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:47:18.0309130Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:47:18.0309760Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:47:18.0430930Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:47:18.0431442Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:47:18.0460706Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:47:18.0462014Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:47:18.0463293Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:47:18.0464578Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:47:18.0465834Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:47:18.0467102Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:47:18.0468769Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:47:18.0470046Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:47:18.1097495Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:47:18.1098191Z warnings.warn(msg, FutureWarning) 2022-05-18T04:47:18.1104280Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:47:18.1104944Z warnings.warn(msg, FutureWarning) 2022-05-18T04:47:18.1206493Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:47:18.1207002Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:47:18.5202931Z ok (3.131s) 2022-05-18T04:47:18.5331281Z test_nested_wrapped_model_offload_true_prefetch_post_none (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 45212 2022-05-18T04:47:18.5437540Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 45213 2022-05-18T04:47:19.4442788Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpz5c_voms 2022-05-18T04:47:19.4443986Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpz5c_voms/_remote_module_non_scriptable.py 2022-05-18T04:47:19.4473806Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpo4l36cuh 2022-05-18T04:47:19.4476774Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpo4l36cuh/_remote_module_non_scriptable.py 2022-05-18T04:47:19.4667864Z dist init r=0, world=2 2022-05-18T04:47:19.4672452Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:47:19.4706325Z dist init r=1, world=2 2022-05-18T04:47:19.4710786Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:47:19.4711735Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:47:19.4776554Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:47:20.8662182Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:47:20.8663166Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:47:21.1807917Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:47:21.1818034Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:47:21.1842323Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:47:21.1843991Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:47:21.1854221Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:47:21.1855556Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:47:21.1972095Z /opt/conda/lib/python3.7/site-packages/torch/optim/sgd.py:230: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:205.) 2022-05-18T04:47:21.1973660Z buf = torch.clone(d_p).detach() 2022-05-18T04:47:21.1975860Z /opt/conda/lib/python3.7/site-packages/torch/optim/sgd.py:230: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:205.) 2022-05-18T04:47:21.1977359Z buf = torch.clone(d_p).detach() 2022-05-18T04:47:21.1982013Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:47:21.1982941Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:47:21.2637369Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:47:21.2638799Z warnings.warn(msg, FutureWarning) 2022-05-18T04:47:21.2641009Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:47:21.2642282Z warnings.warn(msg, FutureWarning) 2022-05-18T04:47:21.2735502Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:47:21.2736483Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:47:21.6520467Z ok (3.132s) 2022-05-18T04:47:21.6644295Z test_nested_wrapped_model_offload_true_prefetch_post_shard_grad_op (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 45299 2022-05-18T04:47:21.6750851Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 45300 2022-05-18T04:47:22.5701899Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpozkiawm6 2022-05-18T04:47:22.5702880Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpozkiawm6/_remote_module_non_scriptable.py 2022-05-18T04:47:22.5825577Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmprmsc3flp 2022-05-18T04:47:22.5828149Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmprmsc3flp/_remote_module_non_scriptable.py 2022-05-18T04:47:22.5932264Z dist init r=1, world=2 2022-05-18T04:47:22.5937056Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:47:22.6048774Z dist init r=0, world=2 2022-05-18T04:47:22.6053278Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:47:22.6054470Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:47:22.6142869Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:47:23.9897695Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:47:23.9898278Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:47:24.3054220Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:47:24.3064838Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:47:24.3087416Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:47:24.3088093Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:47:24.3100388Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:47:24.3101042Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:47:24.3217640Z /opt/conda/lib/python3.7/site-packages/torch/optim/sgd.py:230: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:205.) 2022-05-18T04:47:24.3218425Z buf = torch.clone(d_p).detach() 2022-05-18T04:47:24.3219570Z /opt/conda/lib/python3.7/site-packages/torch/optim/sgd.py:230: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:205.) 2022-05-18T04:47:24.3220344Z buf = torch.clone(d_p).detach() 2022-05-18T04:47:24.3225977Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:47:24.3228919Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:47:24.3869764Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:47:24.3870471Z warnings.warn(msg, FutureWarning) 2022-05-18T04:47:24.3873719Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:47:24.3874386Z warnings.warn(msg, FutureWarning) 2022-05-18T04:47:24.3967885Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:47:24.3969109Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:47:24.7835648Z ok (3.131s) 2022-05-18T04:47:24.7970577Z test_nested_wrapped_model_offload_true_prefetch_pre_no_shard (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 45386 2022-05-18T04:47:24.8078987Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 45387 2022-05-18T04:47:25.7042766Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp7tp0bvy7 2022-05-18T04:47:25.7044308Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp7tp0bvy7/_remote_module_non_scriptable.py 2022-05-18T04:47:25.7272659Z dist init r=1, world=2 2022-05-18T04:47:25.7277195Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:47:25.7410243Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpmf6cfwhc 2022-05-18T04:47:25.7412941Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpmf6cfwhc/_remote_module_non_scriptable.py 2022-05-18T04:47:25.7634141Z dist init r=0, world=2 2022-05-18T04:47:25.7639020Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:47:25.7640401Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:47:25.7686722Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:47:27.1370062Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:47:27.1370843Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:47:27.4481222Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:47:27.4490026Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:47:27.4513972Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:47:27.4514762Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:47:27.4524909Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:47:27.4525544Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:47:27.4637701Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:47:27.4638211Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:47:27.4660369Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:47:27.4662563Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:47:27.4664578Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:47:27.4665849Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:47:27.4667611Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:47:27.4669009Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:47:27.4670366Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:47:27.4671747Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:47:27.5174730Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:47:27.5175427Z warnings.warn(msg, FutureWarning) 2022-05-18T04:47:27.5181057Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:47:27.5181747Z warnings.warn(msg, FutureWarning) 2022-05-18T04:47:27.5272695Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:47:27.5273674Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:47:27.9161206Z ok (3.132s) 2022-05-18T04:47:27.9288711Z test_nested_wrapped_model_offload_true_prefetch_pre_none (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 45473 2022-05-18T04:47:27.9394831Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 45474 2022-05-18T04:47:28.8392115Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmprrd26m68 2022-05-18T04:47:28.8393475Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmprrd26m68/_remote_module_non_scriptable.py 2022-05-18T04:47:28.8622635Z dist init r=1, world=2 2022-05-18T04:47:28.8627453Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:47:28.8752361Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmplwr9fpot 2022-05-18T04:47:28.8755085Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmplwr9fpot/_remote_module_non_scriptable.py 2022-05-18T04:47:28.8972287Z dist init r=0, world=2 2022-05-18T04:47:28.8976642Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:47:28.8977744Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:47:28.9036185Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:47:30.2830240Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:47:30.2830789Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:47:30.5913167Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:47:30.5923713Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:47:30.5946597Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:47:30.5947268Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:47:30.5959459Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:47:30.5960103Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:47:30.6078590Z /opt/conda/lib/python3.7/site-packages/torch/optim/sgd.py:230: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:205.) 2022-05-18T04:47:30.6079421Z buf = torch.clone(d_p).detach() 2022-05-18T04:47:30.6080557Z /opt/conda/lib/python3.7/site-packages/torch/optim/sgd.py:230: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:205.) 2022-05-18T04:47:30.6081324Z buf = torch.clone(d_p).detach() 2022-05-18T04:47:30.6086865Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:47:30.6089218Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:47:30.6754784Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:47:30.6755486Z warnings.warn(msg, FutureWarning) 2022-05-18T04:47:30.6759031Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:47:30.6759689Z warnings.warn(msg, FutureWarning) 2022-05-18T04:47:30.6855552Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:47:30.6857334Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:47:31.0487688Z ok (3.132s) 2022-05-18T04:47:31.0613631Z test_nested_wrapped_model_offload_true_prefetch_pre_shard_grad_op (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 45560 2022-05-18T04:47:31.0718309Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 45561 2022-05-18T04:47:31.9638929Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp6663k_66 2022-05-18T04:47:31.9639852Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp6663k_66/_remote_module_non_scriptable.py 2022-05-18T04:47:31.9707706Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp61jba308 2022-05-18T04:47:31.9710643Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp61jba308/_remote_module_non_scriptable.py 2022-05-18T04:47:31.9858424Z dist init r=1, world=2 2022-05-18T04:47:31.9862655Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:47:31.9940311Z dist init r=0, world=2 2022-05-18T04:47:31.9944632Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:47:31.9945565Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:47:31.9965687Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:47:33.3819974Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:47:33.3820531Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:47:33.6953650Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:47:33.6954235Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:47:33.6986663Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:47:33.6987326Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:47:33.6988183Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:47:33.6988827Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:47:33.7107315Z /opt/conda/lib/python3.7/site-packages/torch/optim/sgd.py:230: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:205.) 2022-05-18T04:47:33.7108116Z buf = torch.clone(d_p).detach() 2022-05-18T04:47:33.7110234Z /opt/conda/lib/python3.7/site-packages/torch/optim/sgd.py:230: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:205.) 2022-05-18T04:47:33.7111031Z buf = torch.clone(d_p).detach() 2022-05-18T04:47:33.7120067Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:47:33.7120566Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:47:33.7768238Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:47:33.7768908Z warnings.warn(msg, FutureWarning) 2022-05-18T04:47:33.7772550Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:47:33.7773494Z warnings.warn(msg, FutureWarning) 2022-05-18T04:47:33.7871604Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:47:33.7872200Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:47:34.1799279Z ok (3.131s) 2022-05-18T04:47:34.1926768Z test_nested_wrapped_model_single_iteration_mixed_precision_cpu_offload_CPUOffload(offload_params=False)_sharding_strategy_None_mixed_precision_False (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 45647 2022-05-18T04:47:34.2032741Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 45648 2022-05-18T04:47:35.1093717Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpwoxmrb49 2022-05-18T04:47:35.1094771Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpwoxmrb49/_remote_module_non_scriptable.py 2022-05-18T04:47:35.1319262Z dist init r=0, world=2 2022-05-18T04:47:35.1324020Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:47:35.1442857Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpmrj37151 2022-05-18T04:47:35.1445774Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpmrj37151/_remote_module_non_scriptable.py 2022-05-18T04:47:35.1674221Z dist init r=1, world=2 2022-05-18T04:47:35.1678899Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:47:35.1679810Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:47:35.1732143Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:47:36.5429235Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:47:36.5429779Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:47:36.8522704Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:47:36.8523405Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:47:36.8526847Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:47:36.8527496Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:47:36.8849679Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:47:36.8850654Z warnings.warn(msg, FutureWarning) 2022-05-18T04:47:36.8851610Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:47:36.8852264Z warnings.warn(msg, FutureWarning) 2022-05-18T04:47:37.2113725Z ok (3.031s) 2022-05-18T04:47:37.2241166Z test_nested_wrapped_model_single_iteration_mixed_precision_cpu_offload_CPUOffload(offload_params=False)_sharding_strategy_None_mixed_precision_True (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 45734 2022-05-18T04:47:37.2345369Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 45735 2022-05-18T04:47:38.1800520Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp2fmz0f5g 2022-05-18T04:47:38.1801964Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp2fmz0f5g/_remote_module_non_scriptable.py 2022-05-18T04:47:38.1805776Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp62eqwd56 2022-05-18T04:47:38.1808694Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp62eqwd56/_remote_module_non_scriptable.py 2022-05-18T04:47:38.2025466Z dist init r=1, world=2 2022-05-18T04:47:38.2029887Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:47:38.2036881Z dist init r=0, world=2 2022-05-18T04:47:38.2041242Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:47:38.2042691Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:47:38.2133315Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:47:39.5760313Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:47:39.5760860Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:47:39.8873650Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:47:39.8874348Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:47:39.8931971Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:47:39.8932637Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:47:39.9303548Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:47:39.9304233Z warnings.warn(msg, FutureWarning) 2022-05-18T04:47:39.9307763Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:47:39.9308448Z warnings.warn(msg, FutureWarning) 2022-05-18T04:47:40.2426242Z ok (3.031s) 2022-05-18T04:47:40.2554490Z test_nested_wrapped_model_single_iteration_mixed_precision_cpu_offload_CPUOffload(offload_params=False)_sharding_strategy_ShardingStrategy_NO_SHARD_mixed_precision_False (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 45821 2022-05-18T04:47:40.2658208Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 45822 2022-05-18T04:47:41.1646583Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp03d7uxpw 2022-05-18T04:47:41.1647761Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp03d7uxpw/_remote_module_non_scriptable.py 2022-05-18T04:47:41.1771523Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp8qljnn5k 2022-05-18T04:47:41.1774600Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp8qljnn5k/_remote_module_non_scriptable.py 2022-05-18T04:47:41.1868449Z dist init r=1, world=2 2022-05-18T04:47:41.1872636Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:47:41.2003104Z dist init r=0, world=2 2022-05-18T04:47:41.2007469Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:47:41.2008933Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:47:41.2077975Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:47:42.5926571Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:47:42.5927569Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:47:42.9061611Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:47:42.9063002Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:47:42.9078889Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:47:42.9080226Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:47:42.9331523Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:47:42.9332977Z warnings.warn(msg, FutureWarning) 2022-05-18T04:47:42.9334931Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:47:42.9336303Z warnings.warn(msg, FutureWarning) 2022-05-18T04:47:43.2739583Z ok (3.031s) 2022-05-18T04:47:43.2868668Z test_nested_wrapped_model_single_iteration_mixed_precision_cpu_offload_CPUOffload(offload_params=False)_sharding_strategy_ShardingStrategy_NO_SHARD_mixed_precision_True (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 45908 2022-05-18T04:47:43.2971015Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 45909 2022-05-18T04:47:44.2199449Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmphygbbsyr 2022-05-18T04:47:44.2200642Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmphygbbsyr/_remote_module_non_scriptable.py 2022-05-18T04:47:44.2322658Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpmdsz811f 2022-05-18T04:47:44.2325088Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpmdsz811f/_remote_module_non_scriptable.py 2022-05-18T04:47:44.2419979Z dist init r=0, world=2 2022-05-18T04:47:44.2424103Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:47:44.2543024Z dist init r=1, world=2 2022-05-18T04:47:44.2547091Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:47:44.2548270Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:47:44.2629314Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:47:45.6121921Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:47:45.6122487Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:47:45.9184368Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:47:45.9185123Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:47:45.9220098Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:47:45.9220741Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:47:45.9493154Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:47:45.9493840Z warnings.warn(msg, FutureWarning) 2022-05-18T04:47:45.9498791Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:47:45.9499465Z warnings.warn(msg, FutureWarning) 2022-05-18T04:47:46.3060278Z ok (3.032s) 2022-05-18T04:47:46.3189194Z test_nested_wrapped_model_single_iteration_mixed_precision_cpu_offload_CPUOffload(offload_params=False)_sharding_strategy_ShardingStrategy_SHARD_GRAD_OP_mixed_precision_False (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 45995 2022-05-18T04:47:46.3296220Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 45996 2022-05-18T04:47:47.2316725Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp289288zu 2022-05-18T04:47:47.2318106Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp289288zu/_remote_module_non_scriptable.py 2022-05-18T04:47:47.2560203Z dist init r=1, world=2 2022-05-18T04:47:47.2565203Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:47:47.2710556Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpzhxsdzlt 2022-05-18T04:47:47.2713165Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpzhxsdzlt/_remote_module_non_scriptable.py 2022-05-18T04:47:47.2931961Z dist init r=0, world=2 2022-05-18T04:47:47.2936623Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:47:47.2937805Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:47:47.2974193Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:47:48.6625264Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:47:48.6625820Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:47:48.9759285Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:47:48.9759992Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:47:48.9761126Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:47:48.9761767Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:47:49.0080974Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:47:49.0081691Z warnings.warn(msg, FutureWarning) 2022-05-18T04:47:49.0083410Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:47:49.0084064Z warnings.warn(msg, FutureWarning) 2022-05-18T04:47:49.3375377Z ok (3.031s) 2022-05-18T04:47:49.3504897Z test_nested_wrapped_model_single_iteration_mixed_precision_cpu_offload_CPUOffload(offload_params=False)_sharding_strategy_ShardingStrategy_SHARD_GRAD_OP_mixed_precision_True (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 46082 2022-05-18T04:47:49.3609434Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 46083 2022-05-18T04:47:50.2632639Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpbr_xxbpt 2022-05-18T04:47:50.2633590Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpbr_xxbpt/_remote_module_non_scriptable.py 2022-05-18T04:47:50.2698207Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpnyi1uvpe 2022-05-18T04:47:50.2701609Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpnyi1uvpe/_remote_module_non_scriptable.py 2022-05-18T04:47:50.2853344Z dist init r=1, world=2 2022-05-18T04:47:50.2857474Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:47:50.2945938Z dist init r=0, world=2 2022-05-18T04:47:50.2950638Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:47:50.2952016Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:47:50.2960536Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:47:51.6833921Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:47:51.6834471Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:47:51.9954027Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:47:51.9954741Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:47:51.9965708Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:47:51.9966369Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:47:52.0316602Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:47:52.0317656Z warnings.warn(msg, FutureWarning) 2022-05-18T04:47:52.0318606Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:47:52.0319361Z warnings.warn(msg, FutureWarning) 2022-05-18T04:47:52.3687905Z ok (3.031s) 2022-05-18T04:47:52.3817412Z test_nested_wrapped_model_single_iteration_mixed_precision_cpu_offload_CPUOffload(offload_params=True)_sharding_strategy_None_mixed_precision_False (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 46169 2022-05-18T04:47:52.3921159Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 46170 2022-05-18T04:47:53.3391732Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpxm5srfms 2022-05-18T04:47:53.3392958Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpxm5srfms/_remote_module_non_scriptable.py 2022-05-18T04:47:53.3409343Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp0atrwuti 2022-05-18T04:47:53.3412486Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp0atrwuti/_remote_module_non_scriptable.py 2022-05-18T04:47:53.3616856Z dist init r=0, world=2 2022-05-18T04:47:53.3621175Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:47:53.3642943Z dist init r=1, world=2 2022-05-18T04:47:53.3647378Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:47:53.3648450Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:47:53.3724912Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:47:54.7585014Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:47:54.7585534Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:47:55.1000719Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:47:55.1002058Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:47:55.1012702Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:47:55.1014073Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:47:55.1129172Z /opt/conda/lib/python3.7/site-packages/torch/optim/sgd.py:230: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:205.) 2022-05-18T04:47:55.1131138Z buf = torch.clone(d_p).detach() 2022-05-18T04:47:55.1133418Z /opt/conda/lib/python3.7/site-packages/torch/optim/sgd.py:230: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:205.) 2022-05-18T04:47:55.1135401Z buf = torch.clone(d_p).detach() 2022-05-18T04:47:55.1519064Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:47:55.1520703Z warnings.warn(msg, FutureWarning) 2022-05-18T04:47:55.1522664Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:47:55.1524025Z warnings.warn(msg, FutureWarning) 2022-05-18T04:47:55.5004224Z ok (3.131s) 2022-05-18T04:47:55.5132832Z test_nested_wrapped_model_single_iteration_mixed_precision_cpu_offload_CPUOffload(offload_params=True)_sharding_strategy_None_mixed_precision_True (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 46256 2022-05-18T04:47:55.5236788Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 46257 2022-05-18T04:47:56.4236812Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpx9k1iua7 2022-05-18T04:47:56.4237716Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpx9k1iua7/_remote_module_non_scriptable.py 2022-05-18T04:47:56.4258449Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp0ve7q705 2022-05-18T04:47:56.4261337Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp0ve7q705/_remote_module_non_scriptable.py 2022-05-18T04:47:56.4459656Z dist init r=0, world=2 2022-05-18T04:47:56.4463579Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:47:56.4490810Z dist init r=1, world=2 2022-05-18T04:47:56.4495400Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:47:56.4496648Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:47:56.4567061Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:47:57.8337819Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:47:57.8338356Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:47:58.1475508Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:47:58.1476220Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:47:58.1514096Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:47:58.1514736Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:47:58.1566021Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:47:58.1567324Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:47:58.1568923Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:47:58.1570592Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:47:58.1592713Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:47:58.1593988Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:47:58.1595253Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:47:58.1596500Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:47:58.2073581Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:47:58.2074267Z warnings.warn(msg, FutureWarning) 2022-05-18T04:47:58.2078487Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:47:58.2079155Z warnings.warn(msg, FutureWarning) 2022-05-18T04:47:58.5316977Z ok (3.031s) 2022-05-18T04:47:58.5446589Z test_nested_wrapped_model_single_iteration_mixed_precision_cpu_offload_CPUOffload(offload_params=True)_sharding_strategy_ShardingStrategy_NO_SHARD_mixed_precision_False (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 46343 2022-05-18T04:47:58.5554761Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 46344 2022-05-18T04:47:59.4600294Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpj62md_lh 2022-05-18T04:47:59.4601150Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpj62md_lh/_remote_module_non_scriptable.py 2022-05-18T04:47:59.4621781Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpbxwjfz8j 2022-05-18T04:47:59.4624582Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpbxwjfz8j/_remote_module_non_scriptable.py 2022-05-18T04:47:59.4820456Z dist init r=0, world=2 2022-05-18T04:47:59.4824826Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:47:59.4851550Z dist init r=1, world=2 2022-05-18T04:47:59.4856053Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:47:59.4857344Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:47:59.4928498Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:48:00.8526329Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:48:00.8526853Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:48:01.1618738Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:48:01.1619463Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:48:01.1690140Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:48:01.1691085Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:48:01.1804508Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:48:01.1805828Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:48:01.1807122Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:48:01.1808403Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:48:01.1809663Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:48:01.1811162Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:48:01.1812426Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:48:01.1813679Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:48:01.2130592Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:48:01.2131438Z warnings.warn(msg, FutureWarning) 2022-05-18T04:48:01.2136529Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:48:01.2137218Z warnings.warn(msg, FutureWarning) 2022-05-18T04:48:01.5632392Z ok (3.031s) 2022-05-18T04:48:01.5766730Z test_nested_wrapped_model_single_iteration_mixed_precision_cpu_offload_CPUOffload(offload_params=True)_sharding_strategy_ShardingStrategy_NO_SHARD_mixed_precision_True (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 46430 2022-05-18T04:48:01.5874193Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 46431 2022-05-18T04:48:02.4937030Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp3e2d1az5 2022-05-18T04:48:02.4938276Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp3e2d1az5/_remote_module_non_scriptable.py 2022-05-18T04:48:02.5168041Z dist init r=0, world=2 2022-05-18T04:48:02.5172406Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:48:02.5293982Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpz7fydm58 2022-05-18T04:48:02.5296433Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpz7fydm58/_remote_module_non_scriptable.py 2022-05-18T04:48:02.5515600Z dist init r=1, world=2 2022-05-18T04:48:02.5519338Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:48:02.5520457Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:48:02.5581336Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:48:03.9372394Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:48:03.9372911Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:48:04.2503761Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:48:04.2504503Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:48:04.2514102Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:48:04.2514733Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:48:04.2576078Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:48:04.2577382Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:48:04.2579099Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:48:04.2580384Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:48:04.2586036Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:48:04.2587326Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:48:04.2588595Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:48:04.2589860Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:48:04.2986349Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:48:04.2987043Z warnings.warn(msg, FutureWarning) 2022-05-18T04:48:04.2992461Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:48:04.2993130Z warnings.warn(msg, FutureWarning) 2022-05-18T04:48:04.6955178Z ok (3.132s) 2022-05-18T04:48:04.7083900Z test_nested_wrapped_model_single_iteration_mixed_precision_cpu_offload_CPUOffload(offload_params=True)_sharding_strategy_ShardingStrategy_SHARD_GRAD_OP_mixed_precision_False (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 46517 2022-05-18T04:48:04.7188779Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 46518 2022-05-18T04:48:05.6142230Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpuihvyqbm 2022-05-18T04:48:05.6143243Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpuihvyqbm/_remote_module_non_scriptable.py 2022-05-18T04:48:05.6373416Z dist init r=1, world=2 2022-05-18T04:48:05.6377952Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:48:05.6435670Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp41iaq6q_ 2022-05-18T04:48:05.6438591Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp41iaq6q_/_remote_module_non_scriptable.py 2022-05-18T04:48:05.6661529Z dist init r=0, world=2 2022-05-18T04:48:05.6665878Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:48:05.6666929Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:48:05.6684788Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:48:07.0492418Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:48:07.0492943Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:48:07.3585785Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:48:07.3586503Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:48:07.3587600Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:48:07.3588242Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:48:07.3702904Z /opt/conda/lib/python3.7/site-packages/torch/optim/sgd.py:230: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:205.) 2022-05-18T04:48:07.3703980Z buf = torch.clone(d_p).detach() 2022-05-18T04:48:07.3705122Z /opt/conda/lib/python3.7/site-packages/torch/optim/sgd.py:230: UserWarning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (Triggered internally at /var/lib/jenkins/workspace/torch/csrc/autograd/python_variable.cpp:205.) 2022-05-18T04:48:07.3705896Z buf = torch.clone(d_p).detach() 2022-05-18T04:48:07.4093670Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:48:07.4094557Z warnings.warn(msg, FutureWarning) 2022-05-18T04:48:07.4096491Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:48:07.4097147Z warnings.warn(msg, FutureWarning) 2022-05-18T04:48:07.7267449Z ok (3.031s) 2022-05-18T04:48:07.7396045Z test_nested_wrapped_model_single_iteration_mixed_precision_cpu_offload_CPUOffload(offload_params=True)_sharding_strategy_ShardingStrategy_SHARD_GRAD_OP_mixed_precision_True (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 46604 2022-05-18T04:48:07.7499974Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 46605 2022-05-18T04:48:08.6543333Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmph5desbx5 2022-05-18T04:48:08.6544225Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmph5desbx5/_remote_module_non_scriptable.py 2022-05-18T04:48:08.6591839Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpluaikdq8 2022-05-18T04:48:08.6594586Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpluaikdq8/_remote_module_non_scriptable.py 2022-05-18T04:48:08.6763134Z dist init r=0, world=2 2022-05-18T04:48:08.6767303Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:48:08.6824793Z dist init r=1, world=2 2022-05-18T04:48:08.6829249Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:48:08.6830229Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:48:08.6871227Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:48:10.0423947Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:48:10.0424520Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:48:10.3570994Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:48:10.3571743Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:48:10.3587141Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:48:10.3588047Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:48:10.3642407Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:48:10.3643937Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:48:10.3645216Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:48:10.3646483Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:48:10.3666233Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:48:10.3667823Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:48:10.3669466Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:48:10.3670741Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:48:10.4131822Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:48:10.4132617Z warnings.warn(msg, FutureWarning) 2022-05-18T04:48:10.4136736Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:48:10.4137687Z warnings.warn(msg, FutureWarning) 2022-05-18T04:48:10.7581549Z ok (3.031s) 2022-05-18T04:48:10.7707642Z test_transformer_parameterized_offload_false_none_no_shard_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 46691 2022-05-18T04:48:10.7814332Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 46692 2022-05-18T04:48:11.6856465Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp2wbe45ln 2022-05-18T04:48:11.6857526Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp2wbe45ln/_remote_module_non_scriptable.py 2022-05-18T04:48:11.7068580Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpuezoud2z 2022-05-18T04:48:11.7071626Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpuezoud2z/_remote_module_non_scriptable.py 2022-05-18T04:48:11.7086518Z dist init r=1, world=2 2022-05-18T04:48:11.7091203Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:48:11.7303397Z dist init r=0, world=2 2022-05-18T04:48:11.7308121Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:48:11.7309108Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:48:11.7398072Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:48:13.1161157Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:48:13.1161699Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:48:13.7036228Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:48:13.7058027Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:48:13.7306566Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:48:13.7307248Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:48:13.7331208Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:48:13.7332305Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:48:13.8054270Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:48:13.8055006Z warnings.warn(msg, FutureWarning) 2022-05-18T04:48:13.8070146Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:48:13.8070833Z warnings.warn(msg, FutureWarning) 2022-05-18T04:48:13.8647537Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:48:13.8655705Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:48:14.3905793Z ok (3.632s) 2022-05-18T04:48:14.4030339Z test_transformer_parameterized_offload_false_none_no_shard_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 46778 2022-05-18T04:48:14.4134782Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 46779 2022-05-18T04:48:15.3158642Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpy_c9isg6 2022-05-18T04:48:15.3159583Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpy_c9isg6/_remote_module_non_scriptable.py 2022-05-18T04:48:15.3201138Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpa2aov7cw 2022-05-18T04:48:15.3204159Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpa2aov7cw/_remote_module_non_scriptable.py 2022-05-18T04:48:15.3381971Z dist init r=0, world=2 2022-05-18T04:48:15.3386533Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:48:15.3435292Z dist init r=1, world=2 2022-05-18T04:48:15.3439835Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:48:15.3441186Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:48:15.3489843Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:48:16.7339187Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:48:16.7339736Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:48:17.3121259Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:48:17.3122067Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:48:17.3383942Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:48:17.3384762Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:48:17.3396886Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:48:17.3397763Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:48:17.4109964Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:48:17.4110726Z warnings.warn(msg, FutureWarning) 2022-05-18T04:48:17.4117707Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:48:17.4118444Z warnings.warn(msg, FutureWarning) 2022-05-18T04:48:17.4681708Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:48:17.4690742Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:48:17.9225149Z ok (3.532s) 2022-05-18T04:48:17.9351373Z test_transformer_parameterized_offload_false_none_none_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 46865 2022-05-18T04:48:17.9454197Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 46866 2022-05-18T04:48:18.8433293Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpon8acf6n 2022-05-18T04:48:18.8434711Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpon8acf6n/_remote_module_non_scriptable.py 2022-05-18T04:48:18.8444298Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp5j3bf5yz 2022-05-18T04:48:18.8447047Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp5j3bf5yz/_remote_module_non_scriptable.py 2022-05-18T04:48:18.8664064Z dist init r=1, world=2 2022-05-18T04:48:18.8668247Z dist init r=0, world=2 2022-05-18T04:48:18.8668813Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:48:18.8672478Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:48:18.8673257Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:48:18.8772356Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:48:20.2352831Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:48:20.2353409Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:48:20.8207396Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:48:20.8228857Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:48:20.8475319Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:48:20.8475997Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:48:20.8501444Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:48:20.8502083Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:48:20.9264769Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:48:20.9265794Z warnings.warn(msg, FutureWarning) 2022-05-18T04:48:20.9290726Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:48:20.9291821Z warnings.warn(msg, FutureWarning) 2022-05-18T04:48:20.9878136Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:48:20.9882216Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:48:21.4545531Z ok (3.532s) 2022-05-18T04:48:21.4670633Z test_transformer_parameterized_offload_false_none_none_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 46952 2022-05-18T04:48:21.4774581Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 46953 2022-05-18T04:48:22.3788464Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmphwd1r1uo 2022-05-18T04:48:22.3789644Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmphwd1r1uo/_remote_module_non_scriptable.py 2022-05-18T04:48:22.3803189Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmppq3spen0 2022-05-18T04:48:22.3806312Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmppq3spen0/_remote_module_non_scriptable.py 2022-05-18T04:48:22.4008233Z dist init r=0, world=2 2022-05-18T04:48:22.4012656Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:48:22.4036318Z dist init r=1, world=2 2022-05-18T04:48:22.4040810Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:48:22.4041868Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:48:22.4116272Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:48:23.7850312Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:48:23.7851059Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:48:24.3734647Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:48:24.3758789Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:48:24.3996078Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:48:24.3996752Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:48:24.4034904Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:48:24.4035554Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:48:24.4800816Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:48:24.4801493Z warnings.warn(msg, FutureWarning) 2022-05-18T04:48:24.4822034Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:48:24.4822993Z warnings.warn(msg, FutureWarning) 2022-05-18T04:48:24.5407650Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:48:24.5421845Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:48:24.9866415Z ok (3.532s) 2022-05-18T04:48:24.9994970Z test_transformer_parameterized_offload_false_none_shard_grad_op_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 47039 2022-05-18T04:48:25.0102602Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 47040 2022-05-18T04:48:25.9146818Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpuypljg5l 2022-05-18T04:48:25.9148146Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpuypljg5l/_remote_module_non_scriptable.py 2022-05-18T04:48:25.9378611Z dist init r=1, world=2 2022-05-18T04:48:25.9383213Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:48:25.9475435Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpyqyun7fz 2022-05-18T04:48:25.9478395Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpyqyun7fz/_remote_module_non_scriptable.py 2022-05-18T04:48:25.9693718Z dist init r=0, world=2 2022-05-18T04:48:25.9698285Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:48:25.9699179Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:48:25.9791878Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:48:27.3553534Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:48:27.3554117Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:48:27.9435554Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:48:27.9457813Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:48:27.9696425Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:48:27.9697192Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:48:27.9733026Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:48:27.9733676Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:48:28.0499321Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:48:28.0500018Z warnings.warn(msg, FutureWarning) 2022-05-18T04:48:28.0520584Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:48:28.0521237Z warnings.warn(msg, FutureWarning) 2022-05-18T04:48:28.1108394Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:48:28.1121182Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:48:28.6194587Z ok (3.633s) 2022-05-18T04:48:28.6320888Z test_transformer_parameterized_offload_false_none_shard_grad_op_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 47126 2022-05-18T04:48:28.6425166Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 47127 2022-05-18T04:48:29.5537743Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpxj6jjkxx 2022-05-18T04:48:29.5539163Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpxj6jjkxx/_remote_module_non_scriptable.py 2022-05-18T04:48:29.5772160Z dist init r=1, world=2 2022-05-18T04:48:29.5776913Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:48:29.5995627Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp1g_2m3v1 2022-05-18T04:48:29.5998322Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp1g_2m3v1/_remote_module_non_scriptable.py 2022-05-18T04:48:29.6223976Z dist init r=0, world=2 2022-05-18T04:48:29.6228308Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:48:29.6229443Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:48:29.6287374Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:48:31.0139148Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:48:31.0139701Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:48:31.5993289Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:48:31.6013352Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:48:31.6259288Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:48:31.6259973Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:48:31.6290423Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:48:31.6291265Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:48:31.7067855Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:48:31.7068546Z warnings.warn(msg, FutureWarning) 2022-05-18T04:48:31.7083005Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:48:31.7083678Z warnings.warn(msg, FutureWarning) 2022-05-18T04:48:31.7685152Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:48:31.7689378Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:48:32.2516032Z ok (3.632s) 2022-05-18T04:48:32.2642422Z test_transformer_parameterized_offload_false_prefetch_post_no_shard_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 47213 2022-05-18T04:48:32.2745382Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 47214 2022-05-18T04:48:33.1842233Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp6v9n2cr2 2022-05-18T04:48:33.1843653Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp6v9n2cr2/_remote_module_non_scriptable.py 2022-05-18T04:48:33.2073393Z dist init r=1, world=2 2022-05-18T04:48:33.2077882Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:48:33.2185798Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpwwsinf64 2022-05-18T04:48:33.2188676Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpwwsinf64/_remote_module_non_scriptable.py 2022-05-18T04:48:33.2407076Z dist init r=0, world=2 2022-05-18T04:48:33.2411423Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:48:33.2412770Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:48:33.2486879Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:48:34.6273514Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:48:34.6274035Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:48:35.2184091Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:48:35.2203660Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:48:35.2453813Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:48:35.2454532Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:48:35.2476319Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:48:35.2476968Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:48:35.3197916Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:48:35.3198595Z warnings.warn(msg, FutureWarning) 2022-05-18T04:48:35.3207847Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:48:35.3208505Z warnings.warn(msg, FutureWarning) 2022-05-18T04:48:35.3795905Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:48:35.3798288Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:48:35.8837501Z ok (3.632s) 2022-05-18T04:48:35.8969406Z test_transformer_parameterized_offload_false_prefetch_post_no_shard_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 47300 2022-05-18T04:48:35.9074741Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 47301 2022-05-18T04:48:36.8024303Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp0raz_wd1 2022-05-18T04:48:36.8025416Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp0raz_wd1/_remote_module_non_scriptable.py 2022-05-18T04:48:36.8254564Z dist init r=1, world=2 2022-05-18T04:48:36.8259085Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:48:36.8546687Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpktxpp760 2022-05-18T04:48:36.8549425Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpktxpp760/_remote_module_non_scriptable.py 2022-05-18T04:48:36.8775965Z dist init r=0, world=2 2022-05-18T04:48:36.8780310Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:48:36.8781818Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:48:36.8871164Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:48:38.2675094Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:48:38.2675647Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:48:38.8500614Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:48:38.8501188Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:48:38.8760803Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:48:38.8761469Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:48:38.8780986Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:48:38.8781655Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:48:38.9525302Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:48:38.9525975Z warnings.warn(msg, FutureWarning) 2022-05-18T04:48:38.9544220Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:48:38.9544897Z warnings.warn(msg, FutureWarning) 2022-05-18T04:48:39.0153659Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:48:39.0154208Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:48:39.5168228Z ok (3.633s) 2022-05-18T04:48:39.5298595Z test_transformer_parameterized_offload_false_prefetch_post_none_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 47387 2022-05-18T04:48:39.5407237Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 47388 2022-05-18T04:48:40.4362908Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpi89t7f60 2022-05-18T04:48:40.4364159Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpi89t7f60/_remote_module_non_scriptable.py 2022-05-18T04:48:40.4370199Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpv2ffp3xr 2022-05-18T04:48:40.4373296Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpv2ffp3xr/_remote_module_non_scriptable.py 2022-05-18T04:48:40.4592148Z dist init r=1, world=2 2022-05-18T04:48:40.4595303Z dist init r=0, world=2 2022-05-18T04:48:40.4596723Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:48:40.4599878Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:48:40.4601035Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:48:40.4700902Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:48:41.8417577Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:48:41.8418126Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:48:42.4282295Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:48:42.4303014Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:48:42.4545957Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:48:42.4546616Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:48:42.4577184Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:48:42.4577830Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:48:42.5345580Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:48:42.5346256Z warnings.warn(msg, FutureWarning) 2022-05-18T04:48:42.5362328Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:48:42.5362994Z warnings.warn(msg, FutureWarning) 2022-05-18T04:48:42.5949131Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:48:42.5957009Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:48:43.0497477Z ok (3.533s) 2022-05-18T04:48:43.0624043Z test_transformer_parameterized_offload_false_prefetch_post_none_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 47474 2022-05-18T04:48:43.0727936Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 47475 2022-05-18T04:48:43.9795752Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp89pul6zy 2022-05-18T04:48:43.9796658Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp89pul6zy/_remote_module_non_scriptable.py 2022-05-18T04:48:43.9830951Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpd10pdxe_ 2022-05-18T04:48:43.9833841Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpd10pdxe_/_remote_module_non_scriptable.py 2022-05-18T04:48:44.0019609Z dist init r=0, world=2 2022-05-18T04:48:44.0023867Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:48:44.0062590Z dist init r=1, world=2 2022-05-18T04:48:44.0067107Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:48:44.0068353Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:48:44.0127509Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:48:45.3967872Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:48:45.3968487Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:48:45.9817654Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:48:45.9818210Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:48:46.0077176Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:48:46.0077822Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:48:46.0086895Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:48:46.0087536Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:48:46.0846762Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:48:46.0847456Z warnings.warn(msg, FutureWarning) 2022-05-18T04:48:46.0864604Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:48:46.0865286Z warnings.warn(msg, FutureWarning) 2022-05-18T04:48:46.1444616Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:48:46.1445121Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:48:46.6822402Z ok (3.632s) 2022-05-18T04:48:46.6951296Z test_transformer_parameterized_offload_false_prefetch_post_shard_grad_op_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 47561 2022-05-18T04:48:46.7057294Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 47562 2022-05-18T04:48:47.5978924Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp2xpz4ne9 2022-05-18T04:48:47.5979590Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp2xpz4ne9/_remote_module_non_scriptable.py 2022-05-18T04:48:47.6086962Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp7uj_5otq 2022-05-18T04:48:47.6089811Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp7uj_5otq/_remote_module_non_scriptable.py 2022-05-18T04:48:47.6198779Z dist init r=0, world=2 2022-05-18T04:48:47.6202957Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:48:47.6321115Z dist init r=1, world=2 2022-05-18T04:48:47.6325731Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:48:47.6326743Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:48:47.6407964Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:48:49.0191890Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:48:49.0192710Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:48:49.6092303Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:48:49.6092886Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:48:49.6362302Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:48:49.6363001Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:48:49.6363849Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:48:49.6364463Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:48:49.7119110Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:48:49.7119883Z warnings.warn(msg, FutureWarning) 2022-05-18T04:48:49.7126267Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:48:49.7126988Z warnings.warn(msg, FutureWarning) 2022-05-18T04:48:49.7710069Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:48:49.7710799Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:48:50.3149905Z ok (3.633s) 2022-05-18T04:48:50.3276317Z test_transformer_parameterized_offload_false_prefetch_post_shard_grad_op_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 47648 2022-05-18T04:48:50.3382057Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 47649 2022-05-18T04:48:51.2515508Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp0y592_af 2022-05-18T04:48:51.2516151Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp0y592_af/_remote_module_non_scriptable.py 2022-05-18T04:48:51.2543449Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpxaubhuke 2022-05-18T04:48:51.2546319Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpxaubhuke/_remote_module_non_scriptable.py 2022-05-18T04:48:51.2737550Z dist init r=1, world=2 2022-05-18T04:48:51.2741871Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:48:51.2775553Z dist init r=0, world=2 2022-05-18T04:48:51.2779891Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:48:51.2781330Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:48:51.2845659Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:48:52.6807420Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:48:52.6807982Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:48:53.2705075Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:48:53.2705941Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:48:53.2973433Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:48:53.2974101Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:48:53.2974950Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:48:53.2975604Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:48:53.3734022Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:48:53.3734718Z warnings.warn(msg, FutureWarning) 2022-05-18T04:48:53.3739265Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:48:53.3739935Z warnings.warn(msg, FutureWarning) 2022-05-18T04:48:53.4324598Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:48:53.4325107Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:48:53.9476907Z ok (3.633s) 2022-05-18T04:48:53.9607019Z test_transformer_parameterized_offload_false_prefetch_pre_no_shard_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 47735 2022-05-18T04:48:53.9716438Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 47736 2022-05-18T04:48:54.8627605Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpkxgm1off 2022-05-18T04:48:54.8628902Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpkxgm1off/_remote_module_non_scriptable.py 2022-05-18T04:48:54.8706553Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpl4fowstm 2022-05-18T04:48:54.8709575Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpl4fowstm/_remote_module_non_scriptable.py 2022-05-18T04:48:54.8850223Z dist init r=0, world=2 2022-05-18T04:48:54.8854998Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:48:54.8940468Z dist init r=1, world=2 2022-05-18T04:48:54.8944941Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:48:54.8945911Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:48:54.8958357Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:48:56.2845655Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:48:56.2846178Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:48:56.8666281Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:48:56.8666846Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:48:56.8932607Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:48:56.8933315Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:48:56.8934149Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:48:56.8934779Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:48:56.9654727Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:48:56.9655641Z warnings.warn(msg, FutureWarning) 2022-05-18T04:48:56.9657442Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:48:56.9658105Z warnings.warn(msg, FutureWarning) 2022-05-18T04:48:57.0229764Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:48:57.0230270Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:48:57.4805475Z ok (3.533s) 2022-05-18T04:48:57.4932542Z test_transformer_parameterized_offload_false_prefetch_pre_no_shard_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 47822 2022-05-18T04:48:57.5037817Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 47823 2022-05-18T04:48:58.4037807Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpzslqeh4r 2022-05-18T04:48:58.4038437Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpzslqeh4r/_remote_module_non_scriptable.py 2022-05-18T04:48:58.4049752Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp7p4bvb82 2022-05-18T04:48:58.4053163Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp7p4bvb82/_remote_module_non_scriptable.py 2022-05-18T04:48:58.4260781Z dist init r=0, world=2 2022-05-18T04:48:58.4265099Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:48:58.4281633Z dist init r=1, world=2 2022-05-18T04:48:58.4286098Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:48:58.4287304Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:48:58.4368541Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:48:59.8174970Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:48:59.8175502Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:49:00.4101312Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:49:00.4121689Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:49:00.4365397Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:49:00.4366396Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:49:00.4395876Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:49:00.4396544Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:49:00.5122519Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:49:00.5123241Z warnings.warn(msg, FutureWarning) 2022-05-18T04:49:00.5136471Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:49:00.5137139Z warnings.warn(msg, FutureWarning) 2022-05-18T04:49:00.5716383Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:49:00.5719955Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:49:01.0129098Z ok (3.532s) 2022-05-18T04:49:01.0259191Z test_transformer_parameterized_offload_false_prefetch_pre_none_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 47909 2022-05-18T04:49:01.0364141Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 47910 2022-05-18T04:49:01.9408197Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp5rlq0xhj 2022-05-18T04:49:01.9409277Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp5rlq0xhj/_remote_module_non_scriptable.py 2022-05-18T04:49:01.9639859Z dist init r=1, world=2 2022-05-18T04:49:01.9644544Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:49:01.9844403Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpocjxfhxa 2022-05-18T04:49:01.9847102Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpocjxfhxa/_remote_module_non_scriptable.py 2022-05-18T04:49:02.0068996Z dist init r=0, world=2 2022-05-18T04:49:02.0073429Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:49:02.0074547Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:49:02.0155219Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:49:03.3939081Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:49:03.3939629Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:49:03.9864481Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:49:03.9886307Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:49:04.0134514Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:49:04.0135190Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:49:04.0161419Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:49:04.0162071Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:49:04.0933038Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:49:04.0933724Z warnings.warn(msg, FutureWarning) 2022-05-18T04:49:04.0943606Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:49:04.0944275Z warnings.warn(msg, FutureWarning) 2022-05-18T04:49:04.1535196Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:49:04.1535683Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:49:04.6458001Z ok (3.633s) 2022-05-18T04:49:04.6583533Z test_transformer_parameterized_offload_false_prefetch_pre_none_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 47996 2022-05-18T04:49:04.6687335Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 47997 2022-05-18T04:49:05.5829331Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpstzepm1d 2022-05-18T04:49:05.5830284Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpstzepm1d/_remote_module_non_scriptable.py 2022-05-18T04:49:05.6051534Z dist init r=0, world=2 2022-05-18T04:49:05.6055894Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:49:05.6249596Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp6p5qiv9x 2022-05-18T04:49:05.6253063Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp6p5qiv9x/_remote_module_non_scriptable.py 2022-05-18T04:49:05.6482078Z dist init r=1, world=2 2022-05-18T04:49:05.6486809Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:49:05.6488099Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:49:05.6565883Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:49:07.0258021Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:49:07.0258586Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:49:07.6037611Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:49:07.6038828Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:49:07.6301729Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:49:07.6302411Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:49:07.6305181Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:49:07.6306141Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:49:07.7055885Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:49:07.7056820Z warnings.warn(msg, FutureWarning) 2022-05-18T04:49:07.7057790Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:49:07.7058424Z warnings.warn(msg, FutureWarning) 2022-05-18T04:49:07.7623041Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:49:07.7623576Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:49:08.2780547Z ok (3.632s) 2022-05-18T04:49:08.2906766Z test_transformer_parameterized_offload_false_prefetch_pre_shard_grad_op_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 48083 2022-05-18T04:49:08.3011597Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 48084 2022-05-18T04:49:09.2040583Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp6zbmapzp 2022-05-18T04:49:09.2044611Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp6zbmapzp/_remote_module_non_scriptable.py 2022-05-18T04:49:09.2065072Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpvth99r6d 2022-05-18T04:49:09.2067812Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpvth99r6d/_remote_module_non_scriptable.py 2022-05-18T04:49:09.2267911Z dist init r=0, world=2 2022-05-18T04:49:09.2271791Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:49:09.2295401Z dist init r=1, world=2 2022-05-18T04:49:09.2299915Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:49:09.2301081Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:49:09.2374526Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:49:10.5991760Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:49:10.5992307Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:49:11.1896240Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:49:11.1896822Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:49:11.2168646Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:49:11.2169336Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:49:11.2171492Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:49:11.2172381Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:49:11.2961792Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:49:11.2962792Z warnings.warn(msg, FutureWarning) 2022-05-18T04:49:11.2968411Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:49:11.2969124Z warnings.warn(msg, FutureWarning) 2022-05-18T04:49:11.3572370Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:49:11.3573117Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:49:11.9101643Z ok (3.632s) 2022-05-18T04:49:11.9228240Z test_transformer_parameterized_offload_false_prefetch_pre_shard_grad_op_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 48170 2022-05-18T04:49:11.9334002Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 48171 2022-05-18T04:49:12.8342699Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp_no4rgto 2022-05-18T04:49:12.8343533Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp_no4rgto/_remote_module_non_scriptable.py 2022-05-18T04:49:12.8573216Z dist init r=1, world=2 2022-05-18T04:49:12.8577578Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:49:12.8704611Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp7vjvgm48 2022-05-18T04:49:12.8706967Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp7vjvgm48/_remote_module_non_scriptable.py 2022-05-18T04:49:12.8924200Z dist init r=0, world=2 2022-05-18T04:49:12.8928510Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:49:12.8929650Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:49:12.8986318Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:49:14.2710435Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:49:14.2710977Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:49:14.8571694Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:49:14.8595757Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:49:14.8836202Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:49:14.8836897Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:49:14.8871871Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:49:14.8872516Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:49:14.9637824Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:49:14.9638543Z warnings.warn(msg, FutureWarning) 2022-05-18T04:49:14.9659512Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:49:14.9660442Z warnings.warn(msg, FutureWarning) 2022-05-18T04:49:15.0246609Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:49:15.0258713Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:49:15.5422927Z ok (3.632s) 2022-05-18T04:49:15.5548928Z test_transformer_parameterized_offload_true_none_no_shard_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 48257 2022-05-18T04:49:15.5652168Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 48258 2022-05-18T04:49:16.4620783Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmprm9u8qtr 2022-05-18T04:49:16.4621819Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmprm9u8qtr/_remote_module_non_scriptable.py 2022-05-18T04:49:16.4686540Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpkk7a9ba6 2022-05-18T04:49:16.4689475Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpkk7a9ba6/_remote_module_non_scriptable.py 2022-05-18T04:49:16.4841424Z dist init r=1, world=2 2022-05-18T04:49:16.4845576Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:49:16.4916920Z dist init r=0, world=2 2022-05-18T04:49:16.4921154Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:49:16.4922468Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:49:16.4948873Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:49:17.8725554Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:49:17.8726117Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:49:18.4676839Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:49:18.4677392Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:49:18.4940990Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:49:18.4941661Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:49:18.4951042Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:49:18.4951701Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:49:18.5035380Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:49:18.5051961Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:49:18.5358972Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:49:18.5359774Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:49:18.6392414Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:49:18.6393126Z warnings.warn(msg, FutureWarning) 2022-05-18T04:49:18.6413065Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:49:18.6413727Z warnings.warn(msg, FutureWarning) 2022-05-18T04:49:18.7007981Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:49:18.7008496Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:49:18.7196646Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:49:18.7201713Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:49:19.1758688Z ok (3.633s) 2022-05-18T04:49:19.1882397Z test_transformer_parameterized_offload_true_none_no_shard_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 48344 2022-05-18T04:49:19.1986153Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 48345 2022-05-18T04:49:20.1004919Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpgybnd026 2022-05-18T04:49:20.1006174Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpgybnd026/_remote_module_non_scriptable.py 2022-05-18T04:49:20.1238482Z dist init r=0, world=2 2022-05-18T04:49:20.1242695Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:49:20.1418063Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpnqiihusr 2022-05-18T04:49:20.1420592Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpnqiihusr/_remote_module_non_scriptable.py 2022-05-18T04:49:20.1639627Z dist init r=1, world=2 2022-05-18T04:49:20.1643824Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:49:20.1644895Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:49:20.1650885Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:49:21.5566429Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:49:21.5566999Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:49:22.1439865Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:49:22.1440403Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:49:22.1708279Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:49:22.1709234Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:49:22.1711931Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:49:22.1712585Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:49:22.1803003Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:49:22.1808779Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:49:22.2114061Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:49:22.2114597Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:49:22.3136192Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:49:22.3136883Z warnings.warn(msg, FutureWarning) 2022-05-18T04:49:22.3143038Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:49:22.3143725Z warnings.warn(msg, FutureWarning) 2022-05-18T04:49:22.3724051Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:49:22.3724572Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:49:22.3914361Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:49:22.3917528Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:49:22.9081773Z ok (3.732s) 2022-05-18T04:49:22.9211727Z test_transformer_parameterized_offload_true_none_none_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 48431 2022-05-18T04:49:22.9320185Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 48432 2022-05-18T04:49:23.8224154Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpmijskocl 2022-05-18T04:49:23.8224999Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpmijskocl/_remote_module_non_scriptable.py 2022-05-18T04:49:23.8268659Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp6ip25f1z 2022-05-18T04:49:23.8271291Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp6ip25f1z/_remote_module_non_scriptable.py 2022-05-18T04:49:23.8452211Z dist init r=1, world=2 2022-05-18T04:49:23.8456723Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:49:23.8487128Z dist init r=0, world=2 2022-05-18T04:49:23.8491396Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:49:23.8492758Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:49:23.8560471Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:49:25.2337177Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:49:25.2337712Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:49:25.8148166Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:49:25.8169604Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:49:25.8410654Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:49:25.8411552Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:49:25.8444926Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:49:25.8445569Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:49:25.8508210Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:49:25.8549908Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:49:25.8847803Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:49:25.8858164Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:49:25.9920401Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:49:25.9921097Z warnings.warn(msg, FutureWarning) 2022-05-18T04:49:25.9942470Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:49:25.9943146Z warnings.warn(msg, FutureWarning) 2022-05-18T04:49:26.0526243Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:49:26.0535118Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:49:26.0710883Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:49:26.0729149Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:49:26.5411901Z ok (3.633s) 2022-05-18T04:49:26.5538499Z test_transformer_parameterized_offload_true_none_none_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 48518 2022-05-18T04:49:26.5641914Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 48519 2022-05-18T04:49:27.4546236Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpn1qsb4tr 2022-05-18T04:49:27.4547517Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpn1qsb4tr/_remote_module_non_scriptable.py 2022-05-18T04:49:27.4629424Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpz_hm2s3u 2022-05-18T04:49:27.4632124Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpz_hm2s3u/_remote_module_non_scriptable.py 2022-05-18T04:49:27.4767486Z dist init r=0, world=2 2022-05-18T04:49:27.4771864Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:49:27.4861128Z dist init r=1, world=2 2022-05-18T04:49:27.4865456Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:49:27.4866411Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:49:27.4875151Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:49:28.8508602Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:49:28.8509148Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:49:29.4371383Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:49:29.4391999Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:49:29.4635061Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:49:29.4635736Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:49:29.4667475Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:49:29.4668139Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:49:29.4733991Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:49:29.4771086Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:49:29.5069086Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:49:29.5078462Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:49:29.6138169Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:49:29.6138912Z warnings.warn(msg, FutureWarning) 2022-05-18T04:49:29.6155845Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:49:29.6156512Z warnings.warn(msg, FutureWarning) 2022-05-18T04:49:29.6739822Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:49:29.6749053Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:49:29.6925867Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:49:29.6943181Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:49:30.1735915Z ok (3.632s) 2022-05-18T04:49:30.1862952Z test_transformer_parameterized_offload_true_none_shard_grad_op_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 48605 2022-05-18T04:49:30.1967020Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 48606 2022-05-18T04:49:31.0993071Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpjnq0vlvg 2022-05-18T04:49:31.0993973Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpjnq0vlvg/_remote_module_non_scriptable.py 2022-05-18T04:49:31.1020403Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp5r_fs9_w 2022-05-18T04:49:31.1022916Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp5r_fs9_w/_remote_module_non_scriptable.py 2022-05-18T04:49:31.1214741Z dist init r=0, world=2 2022-05-18T04:49:31.1219124Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:49:31.1251415Z dist init r=1, world=2 2022-05-18T04:49:31.1255894Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:49:31.1256793Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:49:31.1322680Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:49:32.4958038Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:49:32.4958604Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:49:33.0733944Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:49:33.0734523Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:49:33.1002438Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:49:33.1003431Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:49:33.1005169Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:49:33.1005956Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:49:33.1102465Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:49:33.1106403Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:49:33.1408407Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:49:33.1408911Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:49:33.2452632Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:49:33.2454014Z warnings.warn(msg, FutureWarning) 2022-05-18T04:49:33.2459717Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:49:33.2461110Z warnings.warn(msg, FutureWarning) 2022-05-18T04:49:33.3043860Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:49:33.3044820Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:49:33.3233442Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:49:33.3236420Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:49:33.8060848Z ok (3.632s) 2022-05-18T04:49:33.8190834Z test_transformer_parameterized_offload_true_none_shard_grad_op_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 48692 2022-05-18T04:49:33.8297639Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 48693 2022-05-18T04:49:34.7245291Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpi1xbtmuc 2022-05-18T04:49:34.7246408Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpi1xbtmuc/_remote_module_non_scriptable.py 2022-05-18T04:49:34.7476084Z dist init r=1, world=2 2022-05-18T04:49:34.7480304Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:49:34.7546769Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpr_ys607b 2022-05-18T04:49:34.7549671Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpr_ys607b/_remote_module_non_scriptable.py 2022-05-18T04:49:34.7766569Z dist init r=0, world=2 2022-05-18T04:49:34.7770699Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:49:34.7772024Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:49:34.7786888Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:49:36.1431880Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:49:36.1432431Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:49:36.7311648Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:49:36.7333965Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:49:36.7577965Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:49:36.7578672Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:49:36.7612032Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:49:36.7612659Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:49:36.7676381Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:49:36.7715970Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:49:36.8017731Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:49:36.8026688Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:49:36.9096152Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:49:36.9096868Z warnings.warn(msg, FutureWarning) 2022-05-18T04:49:36.9108464Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:49:36.9109153Z warnings.warn(msg, FutureWarning) 2022-05-18T04:49:36.9706897Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:49:36.9715687Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:49:36.9896778Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:49:36.9914057Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:49:37.4390820Z ok (3.633s) 2022-05-18T04:49:37.4516824Z test_transformer_parameterized_offload_true_prefetch_post_no_shard_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 48779 2022-05-18T04:49:37.4625063Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 48780 2022-05-18T04:49:38.3645521Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpap9m500h 2022-05-18T04:49:38.3646162Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp37fu26aq 2022-05-18T04:49:38.3646743Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpap9m500h/_remote_module_non_scriptable.py 2022-05-18T04:49:38.3647303Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp37fu26aq/_remote_module_non_scriptable.py 2022-05-18T04:49:38.3867205Z dist init r=1, world=2 2022-05-18T04:49:38.3871541Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:49:38.3874380Z dist init r=0, world=2 2022-05-18T04:49:38.3879010Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:49:38.3879811Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:49:38.3974520Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:49:39.7675887Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:49:39.7676419Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:49:40.3465752Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:49:40.3479600Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:49:40.3735238Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:49:40.3735905Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:49:40.3753234Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:49:40.3753881Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:49:40.3831988Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:49:40.3852032Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:49:40.4150628Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:49:40.4155907Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:49:40.5178259Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:49:40.5179071Z warnings.warn(msg, FutureWarning) 2022-05-18T04:49:40.5180029Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:49:40.5180679Z warnings.warn(msg, FutureWarning) 2022-05-18T04:49:40.5764843Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:49:40.5770021Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:49:40.5953587Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:49:40.5962372Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:49:41.0717134Z ok (3.632s) 2022-05-18T04:49:41.0846254Z test_transformer_parameterized_offload_true_prefetch_post_no_shard_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 48866 2022-05-18T04:49:41.0951736Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 48867 2022-05-18T04:49:42.0040148Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp6rvwzmqc 2022-05-18T04:49:42.0041397Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp6rvwzmqc/_remote_module_non_scriptable.py 2022-05-18T04:49:42.0044040Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp5y8c5cf9 2022-05-18T04:49:42.0046811Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp5y8c5cf9/_remote_module_non_scriptable.py 2022-05-18T04:49:42.0265480Z dist init r=0, world=2 2022-05-18T04:49:42.0269690Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:49:42.0273246Z dist init r=1, world=2 2022-05-18T04:49:42.0277879Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:49:42.0278707Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:49:42.0373242Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:49:43.4010593Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:49:43.4011373Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:49:43.9854082Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:49:43.9875534Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:49:44.0115220Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:49:44.0116319Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:49:44.0146001Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:49:44.0146799Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:49:44.0209476Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:49:44.0245531Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:49:44.0540645Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:49:44.0552223Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:49:44.1573879Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:49:44.1574573Z warnings.warn(msg, FutureWarning) 2022-05-18T04:49:44.1588696Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:49:44.1589363Z warnings.warn(msg, FutureWarning) 2022-05-18T04:49:44.2166714Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:49:44.2179966Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:49:44.2348242Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:49:44.2370568Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:49:44.7049080Z ok (3.633s) 2022-05-18T04:49:44.7176917Z test_transformer_parameterized_offload_true_prefetch_post_none_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 48953 2022-05-18T04:49:44.7280507Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 48954 2022-05-18T04:49:45.6198336Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpf9n1sdcw 2022-05-18T04:49:45.6199273Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpf9n1sdcw/_remote_module_non_scriptable.py 2022-05-18T04:49:45.6213667Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp_nk8wkzo 2022-05-18T04:49:45.6216292Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp_nk8wkzo/_remote_module_non_scriptable.py 2022-05-18T04:49:45.6427249Z dist init r=0, world=2 2022-05-18T04:49:45.6431528Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:49:45.6434060Z dist init r=1, world=2 2022-05-18T04:49:45.6438283Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:49:45.6439627Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:49:45.6534688Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:49:46.9960925Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:49:46.9961499Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:49:47.5759104Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:49:47.5772087Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:49:47.6025316Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:49:47.6026020Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:49:47.6036630Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:49:47.6037283Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:49:47.6124604Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:49:47.6134935Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:49:47.6431634Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:49:47.6432173Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:49:47.7480880Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:49:47.7481608Z warnings.warn(msg, FutureWarning) 2022-05-18T04:49:47.7484891Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:49:47.7485571Z warnings.warn(msg, FutureWarning) 2022-05-18T04:49:47.8049368Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:49:47.8049906Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:49:47.8237808Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:49:47.8239667Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:49:48.3373525Z ok (3.632s) 2022-05-18T04:49:48.3503976Z test_transformer_parameterized_offload_true_prefetch_post_none_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 49040 2022-05-18T04:49:48.3608733Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 49041 2022-05-18T04:49:49.2489937Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpv1sjbzz2 2022-05-18T04:49:49.2491158Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpv1sjbzz2/_remote_module_non_scriptable.py 2022-05-18T04:49:49.2554438Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp5pwqm55x 2022-05-18T04:49:49.2557252Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp5pwqm55x/_remote_module_non_scriptable.py 2022-05-18T04:49:49.2708925Z dist init r=0, world=2 2022-05-18T04:49:49.2713206Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:49:49.2786318Z dist init r=1, world=2 2022-05-18T04:49:49.2790414Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:49:49.2791643Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:49:49.2816335Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:49:50.6587998Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:49:50.6588556Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:49:51.2414863Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:49:51.2433196Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:49:51.2682066Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:49:51.2682742Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:49:51.2705180Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:49:51.2705841Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:49:51.2790797Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:49:51.2806243Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:49:51.3099800Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:49:51.3105336Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:49:51.4156806Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:49:51.4157643Z warnings.warn(msg, FutureWarning) 2022-05-18T04:49:51.4164425Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:49:51.4165103Z warnings.warn(msg, FutureWarning) 2022-05-18T04:49:51.4738980Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:49:51.4744680Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:49:51.4931595Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:49:51.4935157Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:49:51.9701551Z ok (3.633s) 2022-05-18T04:49:51.9830467Z test_transformer_parameterized_offload_true_prefetch_post_shard_grad_op_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 49127 2022-05-18T04:49:51.9938103Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 49128 2022-05-18T04:49:52.9002614Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmprrr6ewif 2022-05-18T04:49:52.9004501Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmprrr6ewif/_remote_module_non_scriptable.py 2022-05-18T04:49:52.9018497Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpawwcijf7 2022-05-18T04:49:52.9021196Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpawwcijf7/_remote_module_non_scriptable.py 2022-05-18T04:49:52.9238714Z dist init r=0, world=2 2022-05-18T04:49:52.9243292Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:49:52.9248786Z dist init r=1, world=2 2022-05-18T04:49:52.9253428Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:49:52.9254589Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:49:52.9347062Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:49:54.2961183Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:49:54.2962076Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:49:54.8798893Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:49:54.8819331Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:49:54.9058139Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:49:54.9059115Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:49:54.9089206Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:49:54.9090000Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:49:54.9156679Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:49:54.9191860Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:49:54.9483370Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:49:54.9494028Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:49:55.0548338Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:49:55.0549050Z warnings.warn(msg, FutureWarning) 2022-05-18T04:49:55.0559646Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:49:55.0560339Z warnings.warn(msg, FutureWarning) 2022-05-18T04:49:55.1131204Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:49:55.1142306Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:49:55.1319770Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:49:55.1331057Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:49:55.6030646Z ok (3.633s) 2022-05-18T04:49:55.6157087Z test_transformer_parameterized_offload_true_prefetch_post_shard_grad_op_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 49214 2022-05-18T04:49:55.6261591Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 49215 2022-05-18T04:49:56.5360104Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpryac5ftu 2022-05-18T04:49:56.5361269Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpryac5ftu/_remote_module_non_scriptable.py 2022-05-18T04:49:56.5384981Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpwlltqzzn 2022-05-18T04:49:56.5387623Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpwlltqzzn/_remote_module_non_scriptable.py 2022-05-18T04:49:56.5584968Z dist init r=0, world=2 2022-05-18T04:49:56.5589106Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:49:56.5617719Z dist init r=1, world=2 2022-05-18T04:49:56.5622467Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:49:56.5623493Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:49:56.5692496Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:49:57.9368920Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:49:57.9369461Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:49:58.5267468Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:49:58.5291918Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:49:58.5534309Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:49:58.5579446Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:49:58.5580319Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:49:58.5580963Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:49:58.5632282Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:49:58.5697547Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:49:58.6001008Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:49:58.6013948Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:49:58.7107928Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:49:58.7108657Z warnings.warn(msg, FutureWarning) 2022-05-18T04:49:58.7135635Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:49:58.7136320Z warnings.warn(msg, FutureWarning) 2022-05-18T04:49:58.7742964Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:49:58.7753628Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:49:58.7931877Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:49:58.7958328Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:49:59.2356890Z ok (3.632s) 2022-05-18T04:49:59.2482257Z test_transformer_parameterized_offload_true_prefetch_pre_no_shard_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 49301 2022-05-18T04:49:59.2587293Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 49302 2022-05-18T04:50:00.1594803Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp4a__6eo6 2022-05-18T04:50:00.1595997Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp4a__6eo6/_remote_module_non_scriptable.py 2022-05-18T04:50:00.1791360Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp0dj03utk 2022-05-18T04:50:00.1794403Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp0dj03utk/_remote_module_non_scriptable.py 2022-05-18T04:50:00.1818470Z dist init r=0, world=2 2022-05-18T04:50:00.1822957Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:50:00.2037735Z dist init r=1, world=2 2022-05-18T04:50:00.2042589Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:50:00.2043972Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:50:00.2129952Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:50:01.5854668Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:50:01.5855215Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:50:02.1784015Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:50:02.1804957Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:50:02.2052232Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:50:02.2052889Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:50:02.2091562Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:50:02.2092207Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:50:02.2149362Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:50:02.2205335Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:50:02.2508567Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:50:02.2515174Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:50:02.3553405Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:50:02.3554389Z warnings.warn(msg, FutureWarning) 2022-05-18T04:50:02.3565646Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:50:02.3566318Z warnings.warn(msg, FutureWarning) 2022-05-18T04:50:02.4166919Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:50:02.4174428Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:50:02.4355896Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:50:02.4379715Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:50:02.8679810Z ok (3.632s) 2022-05-18T04:50:02.8809302Z test_transformer_parameterized_offload_true_prefetch_pre_no_shard_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 49388 2022-05-18T04:50:02.8913503Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 49389 2022-05-18T04:50:03.7999045Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpobpjkdnv 2022-05-18T04:50:03.8001030Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpobpjkdnv/_remote_module_non_scriptable.py 2022-05-18T04:50:03.8242901Z dist init r=1, world=2 2022-05-18T04:50:03.8247820Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:50:03.8519733Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpghmvqdf9 2022-05-18T04:50:03.8522993Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpghmvqdf9/_remote_module_non_scriptable.py 2022-05-18T04:50:03.8748426Z dist init r=0, world=2 2022-05-18T04:50:03.8753138Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:50:03.8754110Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:50:03.8757997Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:50:05.2632427Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:50:05.2632998Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:50:05.8446993Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:50:05.8470929Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:50:05.8711780Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:50:05.8712692Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:50:05.8757289Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:50:05.8758083Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:50:05.8809221Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:50:05.8866018Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:50:05.9173862Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:50:05.9186122Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:50:06.0232419Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:50:06.0233113Z warnings.warn(msg, FutureWarning) 2022-05-18T04:50:06.0254904Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:50:06.0255570Z warnings.warn(msg, FutureWarning) 2022-05-18T04:50:06.0858661Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:50:06.0871035Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:50:06.1046050Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:50:06.1077915Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:50:06.6012026Z ok (3.733s) 2022-05-18T04:50:06.6138972Z test_transformer_parameterized_offload_true_prefetch_pre_none_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 49475 2022-05-18T04:50:06.6246629Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 49476 2022-05-18T04:50:07.5215147Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpjykj_5k9 2022-05-18T04:50:07.5216072Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpjykj_5k9/_remote_module_non_scriptable.py 2022-05-18T04:50:07.5387253Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpxmsx77xa 2022-05-18T04:50:07.5390235Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpxmsx77xa/_remote_module_non_scriptable.py 2022-05-18T04:50:07.5434908Z dist init r=1, world=2 2022-05-18T04:50:07.5439099Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:50:07.5633542Z dist init r=0, world=2 2022-05-18T04:50:07.5638239Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:50:07.5639572Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:50:07.5644103Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:50:08.9562390Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:50:08.9562930Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:50:09.5334008Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:50:09.5335046Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:50:09.5598955Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:50:09.5600208Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:50:09.5610631Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:50:09.5611899Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:50:09.5700021Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:50:09.5716265Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:50:09.6024737Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:50:09.6025688Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:50:09.7094044Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:50:09.7095340Z warnings.warn(msg, FutureWarning) 2022-05-18T04:50:09.7105217Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:50:09.7106649Z warnings.warn(msg, FutureWarning) 2022-05-18T04:50:09.7688899Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:50:09.7689893Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:50:09.7877512Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:50:09.7886295Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:50:10.2336623Z ok (3.632s) 2022-05-18T04:50:10.2463638Z test_transformer_parameterized_offload_true_prefetch_pre_none_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 49562 2022-05-18T04:50:10.2567692Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 49563 2022-05-18T04:50:11.1541321Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpvxp3tri6 2022-05-18T04:50:11.1543135Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpvxp3tri6/_remote_module_non_scriptable.py 2022-05-18T04:50:11.1567877Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpie3kzu8c 2022-05-18T04:50:11.1570736Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpie3kzu8c/_remote_module_non_scriptable.py 2022-05-18T04:50:11.1785752Z dist init r=1, world=2 2022-05-18T04:50:11.1790043Z dist init r=0, world=2 2022-05-18T04:50:11.1790460Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:50:11.1794365Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:50:11.1795484Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:50:11.1895150Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:50:12.5288779Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:50:12.5289337Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:50:13.1074479Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:50:13.1075038Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:50:13.1335704Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:50:13.1336392Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:50:13.1343992Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:50:13.1344642Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:50:13.1433293Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:50:13.1443082Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:50:13.1736932Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:50:13.1743704Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:50:13.2793409Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:50:13.2794083Z warnings.warn(msg, FutureWarning) 2022-05-18T04:50:13.2795136Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:50:13.2795804Z warnings.warn(msg, FutureWarning) 2022-05-18T04:50:13.3357265Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:50:13.3363291Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:50:13.3541782Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:50:13.3554861Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:50:13.8658853Z ok (3.632s) 2022-05-18T04:50:13.8784502Z test_transformer_parameterized_offload_true_prefetch_pre_shard_grad_op_clip_norm_type_2_0 (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 49649 2022-05-18T04:50:13.8887729Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 49650 2022-05-18T04:50:14.7858873Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmps743792d 2022-05-18T04:50:14.7860188Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmps743792d/_remote_module_non_scriptable.py 2022-05-18T04:50:14.7918397Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp3zaab9yv 2022-05-18T04:50:14.7921510Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp3zaab9yv/_remote_module_non_scriptable.py 2022-05-18T04:50:14.8084106Z dist init r=0, world=2 2022-05-18T04:50:14.8088395Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:50:14.8168122Z dist init r=1, world=2 2022-05-18T04:50:14.8173122Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:50:14.8174542Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:50:14.8191535Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:50:16.1946698Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:50:16.1947269Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:50:16.7814557Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:50:16.7820446Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:50:16.8081573Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:50:16.8082687Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:50:16.8108390Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:50:16.8109159Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:50:16.8181446Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:50:16.8224481Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:50:16.8525961Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:50:16.8535679Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:50:16.9623193Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:50:16.9623884Z warnings.warn(msg, FutureWarning) 2022-05-18T04:50:16.9646428Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:50:16.9647081Z warnings.warn(msg, FutureWarning) 2022-05-18T04:50:17.0255140Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:50:17.0264980Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:50:17.0443916Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:50:17.0472692Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:50:17.4980169Z ok (3.632s) 2022-05-18T04:50:17.5106289Z test_transformer_parameterized_offload_true_prefetch_pre_shard_grad_op_clip_norm_type_None (__main__.TestParityWithDDP) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 49736 2022-05-18T04:50:17.5210980Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 49737 2022-05-18T04:50:18.4273408Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpirc9bbt7 2022-05-18T04:50:18.4274694Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpirc9bbt7/_remote_module_non_scriptable.py 2022-05-18T04:50:18.4411951Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpnf2m3teh 2022-05-18T04:50:18.4415227Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpnf2m3teh/_remote_module_non_scriptable.py 2022-05-18T04:50:18.4494759Z dist init r=1, world=2 2022-05-18T04:50:18.4498969Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:50:18.4659282Z dist init r=0, world=2 2022-05-18T04:50:18.4663752Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:50:18.4665197Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:50:18.4704026Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:50:19.8664285Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:50:19.8664832Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:50:20.4614060Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:50:20.4614612Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:50:20.4880242Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:50:20.4880923Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:50:20.4887663Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T04:50:20.4888314Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T04:50:20.4978500Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:50:20.5003305Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:50:20.5311905Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:50:20.5312427Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:50:20.6372427Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:50:20.6373132Z warnings.warn(msg, FutureWarning) 2022-05-18T04:50:20.6389917Z /opt/conda/lib/python3.7/site-packages/torch/testing/_deprecated.py:35: FutureWarning: torch.testing.assert_allclose() is deprecated since 1.12 and will be removed in 1.14. Use torch.testing.assert_close() instead. For detailed upgrade instructions see https://github.com/pytorch/pytorch/issues/61844. 2022-05-18T04:50:20.6390585Z warnings.warn(msg, FutureWarning) 2022-05-18T04:50:20.6990272Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:50:20.6990801Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:50:20.7179007Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:50:20.7182277Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T04:50:21.2306570Z ok (3.732s) 2022-05-18T04:50:21.2307102Z 2022-05-18T04:50:21.2308975Z ---------------------------------------------------------------------- 2022-05-18T04:50:21.2309402Z Ran 203 tests in 740.417s 2022-05-18T04:50:21.2309580Z 2022-05-18T04:50:21.2309680Z OK 2022-05-18T04:50:21.2311175Z 2022-05-18T04:50:21.2311573Z Generating XML reports... 2022-05-18T04:50:21.2361857Z Generated XML report: test-reports/python-unittest/distributed.fsdp.test_fsdp_core/TEST-TestHooks-20220518043800.xml 2022-05-18T04:50:21.2367062Z Generated XML report: test-reports/python-unittest/distributed.fsdp.test_fsdp_core/TEST-TestNoGrad-20220518043800.xml 2022-05-18T04:50:21.2371800Z Generated XML report: test-reports/python-unittest/distributed.fsdp.test_fsdp_core/TEST-TestParamInit-20220518043800.xml 2022-05-18T04:50:21.2574515Z Generated XML report: test-reports/python-unittest/distributed.fsdp.test_fsdp_core/TEST-TestParityWithDDP-20220518043800.xml 2022-05-18T04:50:21.5231769Z Running distributed/test_c10d_nccl ... [2022-05-18 04:50:21.522674] 2022-05-18T04:50:21.5232536Z Executing ['/opt/conda/bin/python', 'distributed/test_c10d_nccl.py', '-v', '--subprocess', '--import-slow-tests', '--import-disabled-tests'] ... [2022-05-18 04:50:21.522776] 2022-05-18T04:50:22.4235725Z , <__main__.CommTest testMethod=test_broadcast_coalesced_nccl>, <__main__.CommTest testMethod=test_nccl_barrier>, <__main__.CommTest testMethod=test_nccl_barrier_device_ids>, <__main__.CommTest testMethod=test_nccl_barrier_device_ids_function_argument>, <__main__.CommTest testMethod=test_nccl_barrier_timeout>, <__main__.CommTest testMethod=test_nccl_barrier_timeout_new_group>, <__main__.CommTest testMethod=test_nccl_barrier_timeout_new_group_non_member>, <__main__.CommTest testMethod=test_nccl_warn_not_in_group_debug_detail>, <__main__.CommTest testMethod=test_nccl_warn_not_in_group_debug_info>, <__main__.CommTest testMethod=test_nccl_warn_not_in_group_debug_off>, <__main__.CommTest testMethod=test_pass_nccl_options_high_priority_stream>, <__main__.CommTest testMethod=test_sequence_num_incremented_nccl_default>, <__main__.CommTest testMethod=test_sequence_num_incremented_nccl_subgroup>, <__main__.CommTest testMethod=test_sequence_num_set_default_pg_nccl>, <__main__.CommTest testMethod=test_sequence_num_set_nccl_new_group>]> 2022-05-18T04:50:22.4237539Z test_all_reduce_coalesced_nccl (__main__.CommTest) 2022-05-18T04:50:22.4237871Z test_broadcast_coalesced_nccl (__main__.CommTest) 2022-05-18T04:50:22.4238194Z test_nccl_barrier (__main__.CommTest) 2022-05-18T04:50:22.4238523Z test_nccl_barrier_device_ids (__main__.CommTest) 2022-05-18T04:50:22.4238873Z test_nccl_barrier_device_ids_function_argument (__main__.CommTest) 2022-05-18T04:50:22.4239241Z test_nccl_barrier_timeout (__main__.CommTest) 2022-05-18T04:50:22.4239598Z test_nccl_barrier_timeout_new_group (__main__.CommTest) 2022-05-18T04:50:22.4239970Z test_nccl_barrier_timeout_new_group_non_member (__main__.CommTest) 2022-05-18T04:50:22.4240343Z test_nccl_warn_not_in_group_debug_detail (__main__.CommTest) 2022-05-18T04:50:22.4240713Z test_nccl_warn_not_in_group_debug_info (__main__.CommTest) 2022-05-18T04:50:22.4242964Z test_nccl_warn_not_in_group_debug_off (__main__.CommTest) 2022-05-18T04:50:22.4243386Z test_pass_nccl_options_high_priority_stream (__main__.CommTest) 2022-05-18T04:50:22.4243786Z test_sequence_num_incremented_nccl_default (__main__.CommTest) 2022-05-18T04:50:22.4244344Z test_sequence_num_incremented_nccl_subgroup (__main__.CommTest) 2022-05-18T04:50:22.4244705Z test_sequence_num_set_default_pg_nccl (__main__.CommTest) 2022-05-18T04:50:22.4245302Z test_sequence_num_set_nccl_new_group (__main__.CommTest) 2022-05-18T04:50:22.4257297Z , <__main__.DistributedDataParallelTest testMethod=test_accumulate_gradients_module_with_grad_is_view>, <__main__.DistributedDataParallelTest testMethod=test_arbitrary_forward_return_value>, <__main__.DistributedDataParallelTest testMethod=test_arbitrary_forward_return_value_grad_is_view>, <__main__.DistributedDataParallelTest testMethod=test_bf16_compress_wrapper_is_view>, <__main__.DistributedDataParallelTest testMethod=test_bf16_compress_wrapper_nccl>, <__main__.DistributedDataParallelTest testMethod=test_builtin_ddp_comm_hooks_nccl>, <__main__.DistributedDataParallelTest testMethod=test_builtin_ddp_comm_hooks_nccl_grad_is_view>, <__main__.DistributedDataParallelTest testMethod=test_ddp_checkpointing_dynamic_module>, <__main__.DistributedDataParallelTest testMethod=test_ddp_checkpointing_dynamic_weight_sharing>, <__main__.DistributedDataParallelTest testMethod=test_ddp_checkpointing_once_use_reentrant_False>, <__main__.DistributedDataParallelTest testMethod=test_ddp_checkpointing_once_use_reentrant_True>, <__main__.DistributedDataParallelTest testMethod=test_ddp_checkpointing_twice_static_graph_use_reentrant_False>, <__main__.DistributedDataParallelTest testMethod=test_ddp_checkpointing_twice_static_graph_use_reentrant_True>, <__main__.DistributedDataParallelTest testMethod=test_ddp_checkpointing_twice_use_reentrant_False>, <__main__.DistributedDataParallelTest testMethod=test_ddp_checkpointing_twice_use_reentrant_True>, <__main__.DistributedDataParallelTest testMethod=test_ddp_checkpointing_twice_weight_sharing>, <__main__.DistributedDataParallelTest testMethod=test_ddp_checkpointing_unused_params_use_reentrant_False>, <__main__.DistributedDataParallelTest testMethod=test_ddp_checkpointing_unused_params_use_reentrant_True>, <__main__.DistributedDataParallelTest testMethod=test_ddp_checkpointing_weight_sharing_use_reentrant_False>, <__main__.DistributedDataParallelTest testMethod=test_ddp_checkpointing_weight_sharing_use_reentrant_True>, <__main__.DistributedDataParallelTest testMethod=test_ddp_comm_hook_allreduce_hook_nccl>, <__main__.DistributedDataParallelTest testMethod=test_ddp_comm_hook_allreduce_hook_nccl_grad_is_view>, <__main__.DistributedDataParallelTest testMethod=test_ddp_comm_hook_allreduce_hook_nccl_static_graph>, <__main__.DistributedDataParallelTest testMethod=test_ddp_comm_hook_allreduce_with_then_hook_nccl>, <__main__.DistributedDataParallelTest testMethod=test_ddp_comm_hook_future_passing_gpu_nccl>, <__main__.DistributedDataParallelTest testMethod=test_ddp_multi_device_module_config>, <__main__.DistributedDataParallelTest testMethod=test_ddp_weight_sharing>, <__main__.DistributedDataParallelTest testMethod=test_ddp_with_lazy_parameters>, <__main__.DistributedDataParallelTest testMethod=test_default_ddp_comm_hooks_nccl>, <__main__.DistributedDataParallelTest testMethod=test_default_ddp_comm_hooks_nccl_is_view>, <__main__.DistributedDataParallelTest testMethod=test_failure_recovery>, <__main__.DistributedDataParallelTest testMethod=test_find_unused_parameters_kwarg_debug_detail>, <__main__.DistributedDataParallelTest testMethod=test_find_unused_parameters_kwarg_debug_info>, <__main__.DistributedDataParallelTest testMethod=test_find_unused_parameters_kwarg_debug_off>, <__main__.DistributedDataParallelTest testMethod=test_find_unused_parameters_kwarg_grad_is_view_debug_detail>, <__main__.DistributedDataParallelTest testMethod=test_find_unused_parameters_kwarg_grad_is_view_debug_info>, <__main__.DistributedDataParallelTest testMethod=test_find_unused_parameters_kwarg_grad_is_view_debug_off>, <__main__.DistributedDataParallelTest testMethod=test_fp16>, <__main__.DistributedDataParallelTest testMethod=test_fp16_compress_wrapper_is_view>, <__main__.DistributedDataParallelTest testMethod=test_fp16_compress_wrapper_nccl>, <__main__.DistributedDataParallelTest testMethod=test_fp16_grad_is_view>, <__main__.DistributedDataParallelTest testMethod=test_grad_layout_1devicemodule_1replicaperprocess>, <__main__.DistributedDataParallelTest testMethod=test_grad_layout_2devicemodule>, <__main__.DistributedDataParallelTest testMethod=test_invalid_powerSGD_state>, <__main__.DistributedDataParallelTest testMethod=test_multiple_outputs_multiple_backward>, <__main__.DistributedDataParallelTest testMethod=test_multiple_outputs_multiple_backward_grad_is_view>, <__main__.DistributedDataParallelTest testMethod=test_nccl_backend_1gpu_module_device_ids_integer_list>, <__main__.DistributedDataParallelTest testMethod=test_nccl_backend_1gpu_module_device_ids_torch_device_list>, <__main__.DistributedDataParallelTest testMethod=test_nccl_backend_2gpu_module>, <__main__.DistributedDataParallelTest testMethod=test_nccl_backend_4gpu_module>, <__main__.DistributedDataParallelTest testMethod=test_nccl_backend_multi_device_ids_not_allowed>, <__main__.DistributedDataParallelTest testMethod=test_nccl_backend_multi_device_module_device_ids_None>, <__main__.DistributedDataParallelTest testMethod=test_nccl_backend_single_device_module_device_ids_None>, <__main__.DistributedDataParallelTest testMethod=test_nccl_backend_single_device_module_empty_device_ids>, <__main__.DistributedDataParallelTest testMethod=test_nccl_propagate_error_reason>, <__main__.DistributedDataParallelTest testMethod=test_no_grad>, <__main__.DistributedDataParallelTest testMethod=test_param_layout_mismatch_error>, <__main__.DistributedDataParallelTest testMethod=test_pass_default_pg>, <__main__.DistributedDataParallelTest testMethod=test_powerSGD_ddp_comm_hook_nccl>, <__main__.DistributedDataParallelTest testMethod=test_powerSGD_ddp_comm_hook_nccl_grad_is_view>, <__main__.DistributedDataParallelTest testMethod=test_sync_batch_norm_empty_input>, <__main__.DistributedDataParallelTest testMethod=test_sync_batch_norm_only_empty_input>]> 2022-05-18T04:50:22.4266217Z test_accumulate_gradients_module (__main__.DistributedDataParallelTest) 2022-05-18T04:50:22.4266710Z test_accumulate_gradients_module_with_grad_is_view (__main__.DistributedDataParallelTest) 2022-05-18T04:50:22.4267168Z test_arbitrary_forward_return_value (__main__.DistributedDataParallelTest) 2022-05-18T04:50:22.4267686Z test_arbitrary_forward_return_value_grad_is_view (__main__.DistributedDataParallelTest) 2022-05-18T04:50:22.4268154Z test_bf16_compress_wrapper_is_view (__main__.DistributedDataParallelTest) 2022-05-18T04:50:22.4268604Z test_bf16_compress_wrapper_nccl (__main__.DistributedDataParallelTest) 2022-05-18T04:50:22.4269024Z test_builtin_ddp_comm_hooks_nccl (__main__.DistributedDataParallelTest) 2022-05-18T04:50:22.4269483Z test_builtin_ddp_comm_hooks_nccl_grad_is_view (__main__.DistributedDataParallelTest) 2022-05-18T04:50:22.4269954Z test_ddp_checkpointing_dynamic_module (__main__.DistributedDataParallelTest) 2022-05-18T04:50:22.4270431Z test_ddp_checkpointing_dynamic_weight_sharing (__main__.DistributedDataParallelTest) 2022-05-18T04:50:22.4270902Z test_ddp_checkpointing_once_use_reentrant_False (__main__.DistributedDataParallelTest) 2022-05-18T04:50:22.4271391Z test_ddp_checkpointing_once_use_reentrant_True (__main__.DistributedDataParallelTest) 2022-05-18T04:50:22.4271898Z test_ddp_checkpointing_twice_static_graph_use_reentrant_False (__main__.DistributedDataParallelTest) 2022-05-18T04:50:22.4272408Z test_ddp_checkpointing_twice_static_graph_use_reentrant_True (__main__.DistributedDataParallelTest) 2022-05-18T04:50:22.4272920Z test_ddp_checkpointing_twice_use_reentrant_False (__main__.DistributedDataParallelTest) 2022-05-18T04:50:22.4273411Z test_ddp_checkpointing_twice_use_reentrant_True (__main__.DistributedDataParallelTest) 2022-05-18T04:50:22.4273889Z test_ddp_checkpointing_twice_weight_sharing (__main__.DistributedDataParallelTest) 2022-05-18T04:50:22.4274374Z test_ddp_checkpointing_unused_params_use_reentrant_False (__main__.DistributedDataParallelTest) 2022-05-18T04:50:22.4274884Z test_ddp_checkpointing_unused_params_use_reentrant_True (__main__.DistributedDataParallelTest) 2022-05-18T04:50:22.4275478Z test_ddp_checkpointing_weight_sharing_use_reentrant_False (__main__.DistributedDataParallelTest) 2022-05-18T04:50:22.4275992Z test_ddp_checkpointing_weight_sharing_use_reentrant_True (__main__.DistributedDataParallelTest) 2022-05-18T04:50:22.4276460Z test_ddp_comm_hook_allreduce_hook_nccl (__main__.DistributedDataParallelTest) 2022-05-18T04:50:22.4276984Z test_ddp_comm_hook_allreduce_hook_nccl_grad_is_view (__main__.DistributedDataParallelTest) 2022-05-18T04:50:22.4277478Z test_ddp_comm_hook_allreduce_hook_nccl_static_graph (__main__.DistributedDataParallelTest) 2022-05-18T04:50:22.4277942Z test_ddp_comm_hook_allreduce_with_then_hook_nccl (__main__.DistributedDataParallelTest) 2022-05-18T04:50:22.4278412Z test_ddp_comm_hook_future_passing_gpu_nccl (__main__.DistributedDataParallelTest) 2022-05-18T04:50:22.4278867Z test_ddp_multi_device_module_config (__main__.DistributedDataParallelTest) 2022-05-18T04:50:22.4279297Z test_ddp_weight_sharing (__main__.DistributedDataParallelTest) 2022-05-18T04:50:22.4279710Z test_ddp_with_lazy_parameters (__main__.DistributedDataParallelTest) 2022-05-18T04:50:22.4280147Z test_default_ddp_comm_hooks_nccl (__main__.DistributedDataParallelTest) 2022-05-18T04:50:22.4280595Z test_default_ddp_comm_hooks_nccl_is_view (__main__.DistributedDataParallelTest) 2022-05-18T04:50:22.4281010Z test_failure_recovery (__main__.DistributedDataParallelTest) 2022-05-18T04:50:22.4281460Z test_find_unused_parameters_kwarg_debug_detail (__main__.DistributedDataParallelTest) 2022-05-18T04:50:22.4281944Z test_find_unused_parameters_kwarg_debug_info (__main__.DistributedDataParallelTest) 2022-05-18T04:50:22.4282419Z test_find_unused_parameters_kwarg_debug_off (__main__.DistributedDataParallelTest) 2022-05-18T04:50:22.4282899Z test_find_unused_parameters_kwarg_grad_is_view_debug_detail (__main__.DistributedDataParallelTest) 2022-05-18T04:50:22.4283412Z test_find_unused_parameters_kwarg_grad_is_view_debug_info (__main__.DistributedDataParallelTest) 2022-05-18T04:50:22.4283923Z test_find_unused_parameters_kwarg_grad_is_view_debug_off (__main__.DistributedDataParallelTest) 2022-05-18T04:50:22.4284343Z test_fp16 (__main__.DistributedDataParallelTest) 2022-05-18T04:50:22.4284750Z test_fp16_compress_wrapper_is_view (__main__.DistributedDataParallelTest) 2022-05-18T04:50:22.4285190Z test_fp16_compress_wrapper_nccl (__main__.DistributedDataParallelTest) 2022-05-18T04:50:22.4285612Z test_fp16_grad_is_view (__main__.DistributedDataParallelTest) 2022-05-18T04:50:22.4286055Z test_grad_layout_1devicemodule_1replicaperprocess (__main__.DistributedDataParallelTest) 2022-05-18T04:50:22.4286527Z test_grad_layout_2devicemodule (__main__.DistributedDataParallelTest) 2022-05-18T04:50:22.4286960Z test_invalid_powerSGD_state (__main__.DistributedDataParallelTest) 2022-05-18T04:50:22.4287387Z test_multiple_outputs_multiple_backward (__main__.DistributedDataParallelTest) 2022-05-18T04:50:22.4287861Z test_multiple_outputs_multiple_backward_grad_is_view (__main__.DistributedDataParallelTest) 2022-05-18T04:50:22.4288363Z test_nccl_backend_1gpu_module_device_ids_integer_list (__main__.DistributedDataParallelTest) 2022-05-18T04:50:22.4288864Z test_nccl_backend_1gpu_module_device_ids_torch_device_list (__main__.DistributedDataParallelTest) 2022-05-18T04:50:22.4289318Z test_nccl_backend_2gpu_module (__main__.DistributedDataParallelTest) 2022-05-18T04:50:22.4289749Z test_nccl_backend_4gpu_module (__main__.DistributedDataParallelTest) 2022-05-18T04:50:22.4290205Z test_nccl_backend_multi_device_ids_not_allowed (__main__.DistributedDataParallelTest) 2022-05-18T04:50:22.4291605Z test_nccl_backend_multi_device_module_device_ids_None (__main__.DistributedDataParallelTest) 2022-05-18T04:50:22.4292103Z test_nccl_backend_single_device_module_device_ids_None (__main__.DistributedDataParallelTest) 2022-05-18T04:50:22.4292602Z test_nccl_backend_single_device_module_empty_device_ids (__main__.DistributedDataParallelTest) 2022-05-18T04:50:22.4293071Z test_nccl_propagate_error_reason (__main__.DistributedDataParallelTest) 2022-05-18T04:50:22.4293573Z test_no_grad (__main__.DistributedDataParallelTest) 2022-05-18T04:50:22.4293982Z test_param_layout_mismatch_error (__main__.DistributedDataParallelTest) 2022-05-18T04:50:22.4294401Z test_pass_default_pg (__main__.DistributedDataParallelTest) 2022-05-18T04:50:22.4294804Z test_powerSGD_ddp_comm_hook_nccl (__main__.DistributedDataParallelTest) 2022-05-18T04:50:22.4295326Z test_powerSGD_ddp_comm_hook_nccl_grad_is_view (__main__.DistributedDataParallelTest) 2022-05-18T04:50:22.4295790Z test_sync_batch_norm_empty_input (__main__.DistributedDataParallelTest) 2022-05-18T04:50:22.4296217Z test_sync_batch_norm_only_empty_input (__main__.DistributedDataParallelTest) 2022-05-18T04:50:22.4296601Z 2022-05-18T04:50:22.4297857Z , <__main__.NcclErrorHandlingTest testMethod=test_nccl_blocking_wait_with_barrier>, <__main__.NcclErrorHandlingTest testMethod=test_nccl_errors_blocking_abort>, <__main__.NcclErrorHandlingTest testMethod=test_nccl_errors_blocking_clean_exit>, <__main__.NcclErrorHandlingTest testMethod=test_nccl_errors_blocking_nonzero_exit>, <__main__.NcclErrorHandlingTest testMethod=test_nccl_errors_blocking_sigkill>, <__main__.NcclErrorHandlingTest testMethod=test_nccl_errors_blocking_sigterm>, <__main__.NcclErrorHandlingTest testMethod=test_nccl_errors_nonblocking>, <__main__.NcclErrorHandlingTest testMethod=test_nccl_timeout>]> 2022-05-18T04:50:22.4299087Z test_invalid_nccl_blocking_wait_env (__main__.NcclErrorHandlingTest) 2022-05-18T04:50:22.4299506Z test_nccl_blocking_wait_with_barrier (__main__.NcclErrorHandlingTest) 2022-05-18T04:50:22.4299915Z test_nccl_errors_blocking_abort (__main__.NcclErrorHandlingTest) 2022-05-18T04:50:22.4300304Z test_nccl_errors_blocking_clean_exit (__main__.NcclErrorHandlingTest) 2022-05-18T04:50:22.4300716Z test_nccl_errors_blocking_nonzero_exit (__main__.NcclErrorHandlingTest) 2022-05-18T04:50:22.4301128Z test_nccl_errors_blocking_sigkill (__main__.NcclErrorHandlingTest) 2022-05-18T04:50:22.4301535Z test_nccl_errors_blocking_sigterm (__main__.NcclErrorHandlingTest) 2022-05-18T04:50:22.4301921Z test_nccl_errors_nonblocking (__main__.NcclErrorHandlingTest) 2022-05-18T04:50:22.4302292Z test_nccl_timeout (__main__.NcclErrorHandlingTest) 2022-05-18T04:50:22.4302755Z ]> 2022-05-18T04:50:22.4303202Z test_init_no_gpus (__main__.ProcessGroupNCCLNoGPUTest) 2022-05-18T04:50:22.4305037Z , <__main__.ProcessGroupNCCLTest testMethod=test_allgather_base_ops>, <__main__.ProcessGroupNCCLTest testMethod=test_allgather_ops>, <__main__.ProcessGroupNCCLTest testMethod=test_allreduce_ops>, <__main__.ProcessGroupNCCLTest testMethod=test_barrier>, <__main__.ProcessGroupNCCLTest testMethod=test_broadcast_ops>, <__main__.ProcessGroupNCCLTest testMethod=test_empty_tensors>, <__main__.ProcessGroupNCCLTest testMethod=test_gather_checks>, <__main__.ProcessGroupNCCLTest testMethod=test_gather_ops>, <__main__.ProcessGroupNCCLTest testMethod=test_gather_stress>, <__main__.ProcessGroupNCCLTest testMethod=test_reduce_ops>, <__main__.ProcessGroupNCCLTest testMethod=test_reduce_scatter_base_basics>, <__main__.ProcessGroupNCCLTest testMethod=test_reduce_scatter_base_ops>, <__main__.ProcessGroupNCCLTest testMethod=test_reduce_scatter_ops>, <__main__.ProcessGroupNCCLTest testMethod=test_scatter_checks>, <__main__.ProcessGroupNCCLTest testMethod=test_scatter_ops>, <__main__.ProcessGroupNCCLTest testMethod=test_scatter_stress>]> 2022-05-18T04:50:22.4306968Z test_allgather_base_basics (__main__.ProcessGroupNCCLTest) 2022-05-18T04:50:22.4307355Z test_allgather_base_ops (__main__.ProcessGroupNCCLTest) 2022-05-18T04:50:22.4307727Z test_allgather_ops (__main__.ProcessGroupNCCLTest) 2022-05-18T04:50:22.4308071Z test_allreduce_ops (__main__.ProcessGroupNCCLTest) 2022-05-18T04:50:22.4308490Z test_barrier (__main__.ProcessGroupNCCLTest) 2022-05-18T04:50:22.4308841Z test_broadcast_ops (__main__.ProcessGroupNCCLTest) 2022-05-18T04:50:22.4309182Z test_empty_tensors (__main__.ProcessGroupNCCLTest) 2022-05-18T04:50:22.4309536Z test_gather_checks (__main__.ProcessGroupNCCLTest) 2022-05-18T04:50:22.4309891Z test_gather_ops (__main__.ProcessGroupNCCLTest) 2022-05-18T04:50:22.4310325Z test_gather_stress (__main__.ProcessGroupNCCLTest) 2022-05-18T04:50:22.4310678Z test_reduce_ops (__main__.ProcessGroupNCCLTest) 2022-05-18T04:50:22.4311052Z test_reduce_scatter_base_basics (__main__.ProcessGroupNCCLTest) 2022-05-18T04:50:22.4311447Z test_reduce_scatter_base_ops (__main__.ProcessGroupNCCLTest) 2022-05-18T04:50:22.4311813Z test_reduce_scatter_ops (__main__.ProcessGroupNCCLTest) 2022-05-18T04:50:22.4312181Z test_scatter_checks (__main__.ProcessGroupNCCLTest) 2022-05-18T04:50:22.4312536Z test_scatter_ops (__main__.ProcessGroupNCCLTest) 2022-05-18T04:50:22.4313157Z test_scatter_stress (__main__.ProcessGroupNCCLTest) 2022-05-18T04:50:22.4313616Z ]> 2022-05-18T04:50:22.4314034Z test_common_errors (__main__.RendezvousEnvTest) 2022-05-18T04:50:22.4314365Z 2022-05-18T04:50:22.4314772Z ]> 2022-05-18T04:50:22.4315200Z test_default_store_timeout_nccl (__main__.TimeoutTest) 2022-05-18T04:50:23.3186905Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:50:23.3201193Z 2022-05-18T04:50:23.3201447Z Running tests... 2022-05-18T04:50:23.3201876Z ---------------------------------------------------------------------- 2022-05-18T04:50:24.9197304Z test_all_reduce_coalesced_nccl (__main__.CommTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:50:24.9541729Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 49895 2022-05-18T04:50:24.9647440Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 49896 2022-05-18T04:50:25.8766258Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:50:25.9106563Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:50:28.5746946Z ok (5.254s) 2022-05-18T04:50:28.5747171Z 2022-05-18T04:50:28.5747754Z ---------------------------------------------------------------------- 2022-05-18T04:50:28.5748150Z Ran 1 test in 5.255s 2022-05-18T04:50:28.5748321Z 2022-05-18T04:50:28.5748400Z OK 2022-05-18T04:50:28.5748541Z 2022-05-18T04:50:28.5748687Z Generating XML reports... 2022-05-18T04:50:28.5790916Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CommTest-20220518045023.xml 2022-05-18T04:50:29.7633909Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:50:29.7649718Z 2022-05-18T04:50:29.7650085Z Running tests... 2022-05-18T04:50:29.7650583Z ---------------------------------------------------------------------- 2022-05-18T04:50:31.4139997Z test_broadcast_coalesced_nccl (__main__.CommTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:50:31.4494612Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 50018 2022-05-18T04:50:31.4599390Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 50019 2022-05-18T04:50:32.3755903Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:50:32.4040401Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:50:35.0697500Z ok (5.304s) 2022-05-18T04:50:35.0697872Z 2022-05-18T04:50:35.0698308Z ---------------------------------------------------------------------- 2022-05-18T04:50:35.0698640Z Ran 1 test in 5.305s 2022-05-18T04:50:35.0698812Z 2022-05-18T04:50:35.0698913Z OK 2022-05-18T04:50:35.0699050Z 2022-05-18T04:50:35.0699497Z Generating XML reports... 2022-05-18T04:50:35.0740884Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CommTest-20220518045029.xml 2022-05-18T04:50:36.2567736Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:50:36.2583219Z 2022-05-18T04:50:36.2583525Z Running tests... 2022-05-18T04:50:36.2584179Z ---------------------------------------------------------------------- 2022-05-18T04:50:37.9014652Z test_nccl_barrier (__main__.CommTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:50:37.9360710Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 50141 2022-05-18T04:50:37.9466733Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 50142 2022-05-18T04:50:38.8314094Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:50:38.8425387Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:50:39.0508530Z skip: Need at least 4 CUDA devices (2.792s) 2022-05-18T04:50:39.0508804Z 2022-05-18T04:50:39.0509172Z ---------------------------------------------------------------------- 2022-05-18T04:50:39.0509522Z Ran 1 test in 2.792s 2022-05-18T04:50:39.0509691Z 2022-05-18T04:50:39.0509805Z OK (skipped=1) 2022-05-18T04:50:39.0509962Z 2022-05-18T04:50:39.0510089Z Generating XML reports... 2022-05-18T04:50:39.0565488Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CommTest-20220518045036.xml 2022-05-18T04:50:40.2187870Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:50:40.2203517Z 2022-05-18T04:50:40.2203956Z Running tests... 2022-05-18T04:50:40.2204479Z ---------------------------------------------------------------------- 2022-05-18T04:50:41.8618752Z test_nccl_barrier_device_ids (__main__.CommTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:50:41.8967513Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 50250 2022-05-18T04:50:41.9072992Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 50251 2022-05-18T04:50:42.8011312Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:50:42.8013684Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:50:42.8073384Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:50:42.8077029Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:50:42.8077888Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:50:42.8116602Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:50:44.5147658Z ok (4.294s) 2022-05-18T04:50:44.5147929Z 2022-05-18T04:50:44.5148545Z ---------------------------------------------------------------------- 2022-05-18T04:50:44.5148917Z Ran 1 test in 4.294s 2022-05-18T04:50:44.5149089Z 2022-05-18T04:50:44.5149189Z OK 2022-05-18T04:50:44.5149332Z 2022-05-18T04:50:44.5149448Z Generating XML reports... 2022-05-18T04:50:44.5192721Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CommTest-20220518045040.xml 2022-05-18T04:50:45.6862016Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:50:45.6876415Z 2022-05-18T04:50:45.6876656Z Running tests... 2022-05-18T04:50:45.6877105Z ---------------------------------------------------------------------- 2022-05-18T04:50:47.2922674Z test_nccl_barrier_device_ids_function_argument (__main__.CommTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:50:47.3270928Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 50372 2022-05-18T04:50:47.3377504Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 50373 2022-05-18T04:50:48.2701432Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:50:48.2704229Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:50:48.3130137Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:50:48.3134567Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:50:48.3135693Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:50:48.3214146Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:50:48.5422503Z ok (2.854s) 2022-05-18T04:50:48.5422749Z 2022-05-18T04:50:48.5423407Z ---------------------------------------------------------------------- 2022-05-18T04:50:48.5423781Z Ran 1 test in 2.855s 2022-05-18T04:50:48.5423956Z 2022-05-18T04:50:48.5424058Z OK 2022-05-18T04:50:48.5424201Z 2022-05-18T04:50:48.5424343Z Generating XML reports... 2022-05-18T04:50:48.5469392Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CommTest-20220518045045.xml 2022-05-18T04:50:49.7138828Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:50:49.7154445Z 2022-05-18T04:50:49.7154701Z Running tests... 2022-05-18T04:50:49.7155152Z ---------------------------------------------------------------------- 2022-05-18T04:50:51.3557252Z test_nccl_barrier_timeout (__main__.CommTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:50:51.3903228Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 50485 2022-05-18T04:50:51.4007529Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 50486 2022-05-18T04:50:52.3294019Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:50:52.3449178Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:50:52.5049912Z skip: Need at least 4 CUDA devices (2.789s) 2022-05-18T04:50:52.5050173Z 2022-05-18T04:50:52.5050787Z ---------------------------------------------------------------------- 2022-05-18T04:50:52.5051159Z Ran 1 test in 2.789s 2022-05-18T04:50:52.5051327Z 2022-05-18T04:50:52.5051450Z OK (skipped=1) 2022-05-18T04:50:52.5051609Z 2022-05-18T04:50:52.5051738Z Generating XML reports... 2022-05-18T04:50:52.5106166Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CommTest-20220518045049.xml 2022-05-18T04:50:53.6788356Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:50:53.6803070Z 2022-05-18T04:50:53.6803520Z Running tests... 2022-05-18T04:50:53.6804012Z ---------------------------------------------------------------------- 2022-05-18T04:50:55.3335256Z test_nccl_barrier_timeout_new_group (__main__.CommTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:50:55.3689341Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 50594 2022-05-18T04:50:55.3796770Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 50595 2022-05-18T04:50:56.2665679Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:50:56.2782780Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:50:56.4839610Z skip: Need at least 4 CUDA devices (2.803s) 2022-05-18T04:50:56.4840050Z 2022-05-18T04:50:56.4840701Z ---------------------------------------------------------------------- 2022-05-18T04:50:56.4841286Z Ran 1 test in 2.804s 2022-05-18T04:50:56.4841600Z 2022-05-18T04:50:56.4841799Z OK (skipped=1) 2022-05-18T04:50:56.4842085Z 2022-05-18T04:50:56.4842327Z Generating XML reports... 2022-05-18T04:50:56.4898366Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CommTest-20220518045053.xml 2022-05-18T04:50:57.6601361Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:50:57.6616341Z 2022-05-18T04:50:57.6616726Z Running tests... 2022-05-18T04:50:57.6617460Z ---------------------------------------------------------------------- 2022-05-18T04:50:59.3131690Z test_nccl_barrier_timeout_new_group_non_member (__main__.CommTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:50:59.3488051Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 50703 2022-05-18T04:50:59.3593604Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 50704 2022-05-18T04:51:00.2433840Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:51:00.2633226Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:51:00.4636747Z skip: Need at least 4 CUDA devices (2.802s) 2022-05-18T04:51:00.4636982Z 2022-05-18T04:51:00.4637373Z ---------------------------------------------------------------------- 2022-05-18T04:51:00.4637719Z Ran 1 test in 2.802s 2022-05-18T04:51:00.4637889Z 2022-05-18T04:51:00.4637989Z OK (skipped=1) 2022-05-18T04:51:00.4638148Z 2022-05-18T04:51:00.4638278Z Generating XML reports... 2022-05-18T04:51:00.4695334Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CommTest-20220518045057.xml 2022-05-18T04:51:01.6225752Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:51:01.6239822Z 2022-05-18T04:51:01.6240035Z Running tests... 2022-05-18T04:51:01.6240485Z ---------------------------------------------------------------------- 2022-05-18T04:51:03.2545786Z test_nccl_warn_not_in_group_debug_detail (__main__.CommTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:51:03.2896600Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 50812 2022-05-18T04:51:03.3001404Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 50813 2022-05-18T04:51:04.2069940Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:51:04.2309655Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:51:04.2485726Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:51:04.2486264Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:51:04.2487045Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:51:04.2487746Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:51:04.2488301Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:51:04.2492411Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:51:04.2493459Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:51:04.2591561Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:51:06.0081912Z ok (4.384s) 2022-05-18T04:51:06.0082940Z 2022-05-18T04:51:06.0083628Z ---------------------------------------------------------------------- 2022-05-18T04:51:06.0084260Z Ran 1 test in 4.384s 2022-05-18T04:51:06.0084552Z 2022-05-18T04:51:06.0084714Z OK 2022-05-18T04:51:06.0084944Z 2022-05-18T04:51:06.0085193Z Generating XML reports... 2022-05-18T04:51:06.0128079Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CommTest-20220518045101.xml 2022-05-18T04:51:07.1820499Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:51:07.1835266Z 2022-05-18T04:51:07.1835502Z Running tests... 2022-05-18T04:51:07.1835951Z ---------------------------------------------------------------------- 2022-05-18T04:51:08.7793693Z test_nccl_warn_not_in_group_debug_info (__main__.CommTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:51:08.8143408Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 50949 2022-05-18T04:51:08.8248080Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 50950 2022-05-18T04:51:09.7129957Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:51:09.7135349Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:51:09.7252517Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:51:09.7256301Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:51:09.7257269Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:51:09.7258277Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:51:09.7337875Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:51:09.7340772Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:51:09.7341626Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:51:09.7361532Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:51:11.4323629Z ok (4.248s) 2022-05-18T04:51:11.4323952Z 2022-05-18T04:51:11.4324542Z ---------------------------------------------------------------------- 2022-05-18T04:51:11.4324903Z Ran 1 test in 4.249s 2022-05-18T04:51:11.4325070Z 2022-05-18T04:51:11.4325175Z OK 2022-05-18T04:51:11.4325294Z 2022-05-18T04:51:11.4325433Z Generating XML reports... 2022-05-18T04:51:11.4369040Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CommTest-20220518045107.xml 2022-05-18T04:51:12.6226225Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:51:12.6240812Z 2022-05-18T04:51:12.6241240Z Running tests... 2022-05-18T04:51:12.6241764Z ---------------------------------------------------------------------- 2022-05-18T04:51:14.2846466Z test_nccl_warn_not_in_group_debug_off (__main__.CommTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:51:14.3203929Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 51077 2022-05-18T04:51:14.3312288Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 51078 2022-05-18T04:51:15.2410419Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:51:15.2413763Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:51:15.2787904Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:51:15.2791811Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:51:15.2792785Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:51:15.2794328Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:51:15.2821993Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:51:15.2824701Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:51:15.2825364Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:51:15.2898112Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:51:16.9388658Z ok (4.314s) 2022-05-18T04:51:16.9389052Z 2022-05-18T04:51:16.9389729Z ---------------------------------------------------------------------- 2022-05-18T04:51:16.9390360Z Ran 1 test in 4.315s 2022-05-18T04:51:16.9390668Z 2022-05-18T04:51:16.9390830Z OK 2022-05-18T04:51:16.9391076Z 2022-05-18T04:51:16.9391314Z Generating XML reports... 2022-05-18T04:51:16.9434749Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CommTest-20220518045112.xml 2022-05-18T04:51:18.1193781Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:51:18.1208495Z 2022-05-18T04:51:18.1208851Z Running tests... 2022-05-18T04:51:18.1209311Z ---------------------------------------------------------------------- 2022-05-18T04:51:19.7244761Z test_pass_nccl_options_high_priority_stream (__main__.CommTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:51:19.7594545Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 51205 2022-05-18T04:51:19.7704981Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 51206 2022-05-18T04:51:20.6979933Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:51:20.6982516Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:51:20.7187891Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:51:20.7191508Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:51:20.7192323Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:51:20.7194528Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:51:20.7288945Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:51:20.7291833Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:51:20.7292761Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:51:20.7297459Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:51:23.3803566Z ok (5.259s) 2022-05-18T04:51:23.3803803Z 2022-05-18T04:51:23.3804184Z ---------------------------------------------------------------------- 2022-05-18T04:51:23.3804794Z Ran 1 test in 5.259s 2022-05-18T04:51:23.3805086Z 2022-05-18T04:51:23.3805240Z OK 2022-05-18T04:51:23.3806036Z 2022-05-18T04:51:23.3806314Z Generating XML reports... 2022-05-18T04:51:23.3847977Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CommTest-20220518045118.xml 2022-05-18T04:51:24.5531377Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:51:24.5545932Z 2022-05-18T04:51:24.5546208Z Running tests... 2022-05-18T04:51:24.5546655Z ---------------------------------------------------------------------- 2022-05-18T04:51:26.1713830Z test_sequence_num_incremented_nccl_default (__main__.CommTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:51:26.2065945Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 51332 2022-05-18T04:51:26.2168861Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 51333 2022-05-18T04:51:27.1549839Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:51:27.1558526Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:51:27.1745307Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:51:27.1754876Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:51:27.1755758Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:51:27.1763728Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:51:27.1973993Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:51:27.1974498Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:51:27.1975199Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:51:27.1975894Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:51:28.9247998Z ok (4.370s) 2022-05-18T04:51:28.9248217Z 2022-05-18T04:51:28.9248642Z ---------------------------------------------------------------------- 2022-05-18T04:51:28.9248976Z Ran 1 test in 4.370s 2022-05-18T04:51:28.9249150Z 2022-05-18T04:51:28.9249247Z OK 2022-05-18T04:51:28.9249385Z 2022-05-18T04:51:28.9249529Z Generating XML reports... 2022-05-18T04:51:28.9294703Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CommTest-20220518045124.xml 2022-05-18T04:51:30.0939734Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:51:30.0953881Z 2022-05-18T04:51:30.0954155Z Running tests... 2022-05-18T04:51:30.0955372Z ---------------------------------------------------------------------- 2022-05-18T04:51:31.6888409Z test_sequence_num_incremented_nccl_subgroup (__main__.CommTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:51:31.7237252Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 51460 2022-05-18T04:51:31.7343854Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 51461 2022-05-18T04:51:32.6808774Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:51:32.6922714Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:51:32.8384912Z skip: Need at least 4 CUDA devices (2.743s) 2022-05-18T04:51:32.8385162Z 2022-05-18T04:51:32.8385548Z ---------------------------------------------------------------------- 2022-05-18T04:51:32.8385893Z Ran 1 test in 2.743s 2022-05-18T04:51:32.8386062Z 2022-05-18T04:51:32.8386177Z OK (skipped=1) 2022-05-18T04:51:32.8386333Z 2022-05-18T04:51:32.8386467Z Generating XML reports... 2022-05-18T04:51:32.8443027Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CommTest-20220518045130.xml 2022-05-18T04:51:34.0096639Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:51:34.0111829Z 2022-05-18T04:51:34.0111986Z Running tests... 2022-05-18T04:51:34.0112696Z ---------------------------------------------------------------------- 2022-05-18T04:51:35.6577698Z test_sequence_num_set_default_pg_nccl (__main__.CommTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:51:35.6926622Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 51569 2022-05-18T04:51:35.7032812Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 51570 2022-05-18T04:51:36.6006125Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:51:36.6015644Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:51:36.6060394Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:51:36.6071153Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:51:36.6072553Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:51:36.6119053Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:51:38.3107432Z ok (4.299s) 2022-05-18T04:51:38.3107633Z 2022-05-18T04:51:38.3108050Z ---------------------------------------------------------------------- 2022-05-18T04:51:38.3108395Z Ran 1 test in 4.299s 2022-05-18T04:51:38.3108563Z 2022-05-18T04:51:38.3108662Z OK 2022-05-18T04:51:38.3108798Z 2022-05-18T04:51:38.3108937Z Generating XML reports... 2022-05-18T04:51:38.3152016Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CommTest-20220518045134.xml 2022-05-18T04:51:39.4845276Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:51:39.4860131Z 2022-05-18T04:51:39.4860399Z Running tests... 2022-05-18T04:51:39.4860848Z ---------------------------------------------------------------------- 2022-05-18T04:51:41.1072966Z test_sequence_num_set_nccl_new_group (__main__.CommTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:51:41.1427475Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 51691 2022-05-18T04:51:41.1534511Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 51692 2022-05-18T04:51:42.0479591Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:51:42.0487302Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:51:42.0524557Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:51:42.0534436Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:51:42.0535681Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:51:42.0538029Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T04:51:42.0590402Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:51:42.0593016Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T04:51:42.0593687Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:51:42.0641250Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T04:51:43.7611427Z ok (4.275s) 2022-05-18T04:51:43.7611666Z 2022-05-18T04:51:43.7612081Z ---------------------------------------------------------------------- 2022-05-18T04:51:43.7612410Z Ran 1 test in 4.275s 2022-05-18T04:51:43.7612584Z 2022-05-18T04:51:43.7614288Z OK 2022-05-18T04:51:43.7614488Z 2022-05-18T04:51:43.7614932Z Generating XML reports... 2022-05-18T04:51:43.7657168Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CommTest-20220518045139.xml 2022-05-18T04:51:44.9343055Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:51:44.9357845Z 2022-05-18T04:51:44.9358124Z Running tests... 2022-05-18T04:51:44.9358576Z ---------------------------------------------------------------------- 2022-05-18T04:51:46.5281156Z test_accumulate_gradients_module (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:51:46.5630088Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 51817 2022-05-18T04:51:46.5737392Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 51818 2022-05-18T04:51:47.4329302Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:51:47.4773295Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:51:48.7104567Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp8dk5a2e_ 2022-05-18T04:51:48.7105415Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp8dk5a2e_/_remote_module_non_scriptable.py 2022-05-18T04:51:48.7794123Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp097cj5b8 2022-05-18T04:51:48.7795680Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp097cj5b8/_remote_module_non_scriptable.py 2022-05-18T04:51:50.1312118Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:51:50.1316812Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:51:50.4840989Z ok (5.548s) 2022-05-18T04:51:50.4841181Z 2022-05-18T04:51:50.4841593Z ---------------------------------------------------------------------- 2022-05-18T04:51:50.4842184Z Ran 1 test in 5.548s 2022-05-18T04:51:50.4842373Z 2022-05-18T04:51:50.4842470Z OK 2022-05-18T04:51:50.4842609Z 2022-05-18T04:51:50.4842725Z Generating XML reports... 2022-05-18T04:51:50.4885893Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045144.xml 2022-05-18T04:51:51.6513157Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:51:51.6527671Z 2022-05-18T04:51:51.6527904Z Running tests... 2022-05-18T04:51:51.6528832Z ---------------------------------------------------------------------- 2022-05-18T04:51:53.2511569Z test_accumulate_gradients_module_with_grad_is_view (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:51:53.2861845Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 51944 2022-05-18T04:51:53.2969803Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 51945 2022-05-18T04:51:54.1983072Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:51:54.2057477Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:51:55.5149909Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmppwhlhysm 2022-05-18T04:51:55.5151055Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmppwhlhysm/_remote_module_non_scriptable.py 2022-05-18T04:51:55.5404724Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpboou7y9q 2022-05-18T04:51:55.5406140Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpboou7y9q/_remote_module_non_scriptable.py 2022-05-18T04:51:56.8557766Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:51:56.8558570Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:51:57.2071886Z ok (5.554s) 2022-05-18T04:51:57.2072310Z 2022-05-18T04:51:57.2073091Z ---------------------------------------------------------------------- 2022-05-18T04:51:57.2073505Z Ran 1 test in 5.554s 2022-05-18T04:51:57.2073656Z 2022-05-18T04:51:57.2073754Z OK 2022-05-18T04:51:57.2073892Z 2022-05-18T04:51:57.2074028Z Generating XML reports... 2022-05-18T04:51:57.2117178Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045151.xml 2022-05-18T04:51:58.3869179Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:51:58.3883136Z 2022-05-18T04:51:58.3883815Z Running tests... 2022-05-18T04:51:58.3884701Z ---------------------------------------------------------------------- 2022-05-18T04:51:59.9901256Z test_arbitrary_forward_return_value (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:52:00.0256144Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 52071 2022-05-18T04:52:00.0366422Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 52072 2022-05-18T04:52:00.9357992Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:52:00.9416916Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:52:02.2423542Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp3asu8_6l 2022-05-18T04:52:02.2424172Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp3asu8_6l/_remote_module_non_scriptable.py 2022-05-18T04:52:02.2855628Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp51u25wws 2022-05-18T04:52:02.2857015Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp51u25wws/_remote_module_non_scriptable.py 2022-05-18T04:52:03.5583799Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:52:03.5584370Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:52:03.9471353Z ok (5.559s) 2022-05-18T04:52:03.9471591Z 2022-05-18T04:52:03.9472313Z ---------------------------------------------------------------------- 2022-05-18T04:52:03.9473030Z Ran 1 test in 5.559s 2022-05-18T04:52:03.9473252Z 2022-05-18T04:52:03.9473355Z OK 2022-05-18T04:52:03.9473498Z 2022-05-18T04:52:03.9476042Z Generating XML reports... 2022-05-18T04:52:03.9516948Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045158.xml 2022-05-18T04:52:05.1338073Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:52:05.1353353Z 2022-05-18T04:52:05.1353627Z Running tests... 2022-05-18T04:52:05.1354065Z ---------------------------------------------------------------------- 2022-05-18T04:52:06.7825233Z test_arbitrary_forward_return_value_grad_is_view (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:52:06.8185020Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 52198 2022-05-18T04:52:06.8294043Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 52199 2022-05-18T04:52:07.7388138Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:52:07.7514253Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:52:09.0327684Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp3ffzcp26 2022-05-18T04:52:09.0328832Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp3ffzcp26/_remote_module_non_scriptable.py 2022-05-18T04:52:09.1173820Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpjg1h2bp9 2022-05-18T04:52:09.1175173Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpjg1h2bp9/_remote_module_non_scriptable.py 2022-05-18T04:52:10.3552991Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:52:10.3553568Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:52:10.7398010Z ok (5.604s) 2022-05-18T04:52:10.7398244Z 2022-05-18T04:52:10.7398654Z ---------------------------------------------------------------------- 2022-05-18T04:52:10.7399004Z Ran 1 test in 5.604s 2022-05-18T04:52:10.7399155Z 2022-05-18T04:52:10.7399253Z OK 2022-05-18T04:52:10.7399395Z 2022-05-18T04:52:10.7399530Z Generating XML reports... 2022-05-18T04:52:10.7442121Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045205.xml 2022-05-18T04:52:11.9165944Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:52:11.9180041Z 2022-05-18T04:52:11.9180210Z Running tests... 2022-05-18T04:52:11.9180678Z ---------------------------------------------------------------------- 2022-05-18T04:52:13.5222255Z test_bf16_compress_wrapper_is_view (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:52:13.5571368Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 52325 2022-05-18T04:52:13.5680025Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 52326 2022-05-18T04:52:14.4716772Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:52:14.4718220Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-05-18T04:52:14.4762287Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:52:14.4765123Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-05-18T04:52:15.7988650Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp_n4v7kdn 2022-05-18T04:52:15.7989273Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp_n4v7kdn/_remote_module_non_scriptable.py 2022-05-18T04:52:15.8121476Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpeviag9e2 2022-05-18T04:52:15.8124162Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpeviag9e2/_remote_module_non_scriptable.py 2022-05-18T04:52:17.1776507Z ok (5.259s) 2022-05-18T04:52:17.1776742Z 2022-05-18T04:52:17.1777149Z ---------------------------------------------------------------------- 2022-05-18T04:52:17.1777477Z Ran 1 test in 5.260s 2022-05-18T04:52:17.1777646Z 2022-05-18T04:52:17.1777767Z OK 2022-05-18T04:52:17.1777904Z 2022-05-18T04:52:17.1778039Z Generating XML reports... 2022-05-18T04:52:17.1821192Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045211.xml 2022-05-18T04:52:18.3796119Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:52:18.3811228Z 2022-05-18T04:52:18.3811679Z Running tests... 2022-05-18T04:52:18.3812190Z ---------------------------------------------------------------------- 2022-05-18T04:52:20.0452207Z test_bf16_compress_wrapper_nccl (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:52:20.0809912Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 52452 2022-05-18T04:52:20.0918321Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 52453 2022-05-18T04:52:20.9744190Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:52:20.9746132Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-05-18T04:52:20.9950066Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:52:20.9953174Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-05-18T04:52:22.2890507Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmph1jvmmny 2022-05-18T04:52:22.2891598Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmph1jvmmny/_remote_module_non_scriptable.py 2022-05-18T04:52:22.3014649Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpkv3lv4_7 2022-05-18T04:52:22.3017296Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpkv3lv4_7/_remote_module_non_scriptable.py 2022-05-18T04:52:23.7017458Z ok (5.320s) 2022-05-18T04:52:23.7017683Z 2022-05-18T04:52:23.7018108Z ---------------------------------------------------------------------- 2022-05-18T04:52:23.7018465Z Ran 1 test in 5.321s 2022-05-18T04:52:23.7018658Z 2022-05-18T04:52:23.7018758Z OK 2022-05-18T04:52:23.7018899Z 2022-05-18T04:52:23.7019017Z Generating XML reports... 2022-05-18T04:52:23.7062786Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045218.xml 2022-05-18T04:52:24.8921632Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:52:24.8936493Z 2022-05-18T04:52:24.8936714Z Running tests... 2022-05-18T04:52:24.8937176Z ---------------------------------------------------------------------- 2022-05-18T04:52:26.5291858Z test_builtin_ddp_comm_hooks_nccl (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:52:26.5642221Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 52579 2022-05-18T04:52:26.5749505Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 52580 2022-05-18T04:52:27.4989388Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:52:27.5278354Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:52:28.8022382Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpu__3qn25 2022-05-18T04:52:28.8023934Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpu__3qn25/_remote_module_non_scriptable.py 2022-05-18T04:52:28.8091580Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpivwh3zj6 2022-05-18T04:52:28.8095502Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpivwh3zj6/_remote_module_non_scriptable.py 2022-05-18T04:52:30.1848316Z ok (5.291s) 2022-05-18T04:52:30.1848552Z 2022-05-18T04:52:30.1848958Z ---------------------------------------------------------------------- 2022-05-18T04:52:30.1849307Z Ran 1 test in 5.291s 2022-05-18T04:52:30.1849482Z 2022-05-18T04:52:30.1849560Z OK 2022-05-18T04:52:30.1849699Z 2022-05-18T04:52:30.1849835Z Generating XML reports... 2022-05-18T04:52:30.1895267Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045224.xml 2022-05-18T04:52:31.3836383Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:52:31.3851124Z 2022-05-18T04:52:31.3851280Z Running tests... 2022-05-18T04:52:31.3852209Z ---------------------------------------------------------------------- 2022-05-18T04:52:33.0271215Z test_builtin_ddp_comm_hooks_nccl_grad_is_view (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:52:33.0623072Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 52706 2022-05-18T04:52:33.0731687Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 52707 2022-05-18T04:52:34.0254697Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:52:34.0307330Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:52:35.3565612Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp1d0jnu72 2022-05-18T04:52:35.3567335Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp1d0jnu72/_remote_module_non_scriptable.py 2022-05-18T04:52:35.3710328Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp55zjscdd 2022-05-18T04:52:35.3714199Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp55zjscdd/_remote_module_non_scriptable.py 2022-05-18T04:52:36.7833930Z ok (5.398s) 2022-05-18T04:52:36.7834237Z 2022-05-18T04:52:36.7834759Z ---------------------------------------------------------------------- 2022-05-18T04:52:36.7835241Z Ran 1 test in 5.398s 2022-05-18T04:52:36.7835494Z 2022-05-18T04:52:36.7835596Z OK 2022-05-18T04:52:36.7835735Z 2022-05-18T04:52:36.7835874Z Generating XML reports... 2022-05-18T04:52:36.7879405Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045231.xml 2022-05-18T04:52:37.9559443Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:52:37.9573805Z 2022-05-18T04:52:37.9573939Z Running tests... 2022-05-18T04:52:37.9574393Z ---------------------------------------------------------------------- 2022-05-18T04:52:37.9582977Z test_ddp_checkpointing_dynamic_module (__main__.DistributedDataParallelTest) 2022-05-18T04:52:39.5677295Z Dynamic module can be checkpointed, multiple times, with non-reentrant ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:52:39.6030491Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 52833 2022-05-18T04:52:39.6136991Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 52834 2022-05-18T04:52:40.5134974Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:52:40.5433648Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:52:41.8454852Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpnq9g1_cy 2022-05-18T04:52:41.8455506Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpnq9g1_cy/_remote_module_non_scriptable.py 2022-05-18T04:52:41.8522652Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpp1f7z8mt 2022-05-18T04:52:41.8525540Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpp1f7z8mt/_remote_module_non_scriptable.py 2022-05-18T04:52:42.5217731Z ok (4.564s) 2022-05-18T04:52:42.5217961Z 2022-05-18T04:52:42.5218362Z ---------------------------------------------------------------------- 2022-05-18T04:52:42.5218711Z Ran 1 test in 4.564s 2022-05-18T04:52:42.5218860Z 2022-05-18T04:52:42.5218958Z OK 2022-05-18T04:52:42.5219100Z 2022-05-18T04:52:42.5219236Z Generating XML reports... 2022-05-18T04:52:42.5262694Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045237.xml 2022-05-18T04:52:43.7042269Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:52:43.7057447Z 2022-05-18T04:52:43.7057966Z Running tests... 2022-05-18T04:52:43.7058508Z ---------------------------------------------------------------------- 2022-05-18T04:52:43.7066920Z test_ddp_checkpointing_dynamic_weight_sharing (__main__.DistributedDataParallelTest) 2022-05-18T04:52:45.3525329Z Dynamic module can be checkpointed multiple times with weight sharing ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:52:45.3883432Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 52959 2022-05-18T04:52:45.3991482Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 52960 2022-05-18T04:52:46.3245900Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:52:46.3420641Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:52:47.6333764Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp8veqdxgg 2022-05-18T04:52:47.6334649Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp8veqdxgg/_remote_module_non_scriptable.py 2022-05-18T04:52:47.6631707Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpe7qa4h3h 2022-05-18T04:52:47.6634547Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpe7qa4h3h/_remote_module_non_scriptable.py 2022-05-18T04:52:48.4075865Z ok (4.701s) 2022-05-18T04:52:48.4076090Z 2022-05-18T04:52:48.4076538Z ---------------------------------------------------------------------- 2022-05-18T04:52:48.4076899Z Ran 1 test in 4.702s 2022-05-18T04:52:48.4077050Z 2022-05-18T04:52:48.4077155Z OK 2022-05-18T04:52:48.4077297Z 2022-05-18T04:52:48.4077445Z Generating XML reports... 2022-05-18T04:52:48.4122333Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045243.xml 2022-05-18T04:52:49.5539589Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:52:49.5553838Z 2022-05-18T04:52:49.5554316Z Running tests... 2022-05-18T04:52:49.5554837Z ---------------------------------------------------------------------- 2022-05-18T04:52:49.5565583Z test_ddp_checkpointing_once_use_reentrant_False (__main__.DistributedDataParallelTest) 2022-05-18T04:52:51.1514029Z DDP works as expected when layer is checkpointed only once. ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:52:51.1863170Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 53085 2022-05-18T04:52:51.1968817Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 53086 2022-05-18T04:52:52.0344838Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:52:52.0830335Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:52:53.3252433Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmphmc323_2 2022-05-18T04:52:53.3253727Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmphmc323_2/_remote_module_non_scriptable.py 2022-05-18T04:52:53.3667264Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpdz32xumu 2022-05-18T04:52:53.3669324Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpdz32xumu/_remote_module_non_scriptable.py 2022-05-18T04:52:53.7360772Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:52:53.7361361Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:52:53.7648167Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:52:53.7648665Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:52:53.7796500Z /opt/conda/lib/python3.7/site-packages/torch/nn/parallel/distributed.py:1737: UserWarning: You passed find_unused_parameters=true to DistributedDataParallel, `_set_static_graph` will detect unused parameters automatically, so you do not need to set find_unused_parameters=true, just be sure these unused parameters will not change during training loop while calling `_set_static_graph`. 2022-05-18T04:52:53.7797501Z "You passed find_unused_parameters=true to DistributedDataParallel, " 2022-05-18T04:52:53.7798641Z /opt/conda/lib/python3.7/site-packages/torch/nn/parallel/distributed.py:1737: UserWarning: You passed find_unused_parameters=true to DistributedDataParallel, `_set_static_graph` will detect unused parameters automatically, so you do not need to set find_unused_parameters=true, just be sure these unused parameters will not change during training loop while calling `_set_static_graph`. 2022-05-18T04:52:53.7799465Z "You passed find_unused_parameters=true to DistributedDataParallel, " 2022-05-18T04:52:53.7905302Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:52:53.7906129Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:52:53.8115124Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:52:53.8115628Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:52:53.8402142Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:52:53.8402658Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:52:53.8650924Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:52:53.8651423Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:52:54.2063640Z ok (4.651s) 2022-05-18T04:52:54.2063864Z 2022-05-18T04:52:54.2064280Z ---------------------------------------------------------------------- 2022-05-18T04:52:54.2064647Z Ran 1 test in 4.651s 2022-05-18T04:52:54.2064817Z 2022-05-18T04:52:54.2064894Z OK 2022-05-18T04:52:54.2065033Z 2022-05-18T04:52:54.2065169Z Generating XML reports... 2022-05-18T04:52:54.2108240Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045249.xml 2022-05-18T04:52:55.3945729Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:52:55.3960768Z 2022-05-18T04:52:55.3960912Z Running tests... 2022-05-18T04:52:55.3961600Z ---------------------------------------------------------------------- 2022-05-18T04:52:55.3973044Z test_ddp_checkpointing_once_use_reentrant_True (__main__.DistributedDataParallelTest) 2022-05-18T04:52:57.0503107Z DDP works as expected when layer is checkpointed only once. ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:52:57.0862757Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 53211 2022-05-18T04:52:57.0968752Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 53212 2022-05-18T04:52:58.0249976Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:52:58.0423164Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:52:59.3627497Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpukr7k8r7 2022-05-18T04:52:59.3628123Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpukr7k8r7/_remote_module_non_scriptable.py 2022-05-18T04:52:59.3803470Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp7b7ejm1f 2022-05-18T04:52:59.3806216Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp7b7ejm1f/_remote_module_non_scriptable.py 2022-05-18T04:52:59.7438614Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:52:59.7454737Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:52:59.7773277Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:52:59.7779618Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:52:59.7943786Z /opt/conda/lib/python3.7/site-packages/torch/nn/parallel/distributed.py:1737: UserWarning: You passed find_unused_parameters=true to DistributedDataParallel, `_set_static_graph` will detect unused parameters automatically, so you do not need to set find_unused_parameters=true, just be sure these unused parameters will not change during training loop while calling `_set_static_graph`. 2022-05-18T04:52:59.7944680Z "You passed find_unused_parameters=true to DistributedDataParallel, " 2022-05-18T04:52:59.7945842Z /opt/conda/lib/python3.7/site-packages/torch/nn/parallel/distributed.py:1737: UserWarning: You passed find_unused_parameters=true to DistributedDataParallel, `_set_static_graph` will detect unused parameters automatically, so you do not need to set find_unused_parameters=true, just be sure these unused parameters will not change during training loop while calling `_set_static_graph`. 2022-05-18T04:52:59.7946986Z "You passed find_unused_parameters=true to DistributedDataParallel, " 2022-05-18T04:52:59.8053511Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:52:59.8060014Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:52:59.8279812Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:52:59.8283419Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:52:59.8597394Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:52:59.8603074Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:52:59.8869477Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:52:59.8875390Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:53:00.2055379Z ok (4.809s) 2022-05-18T04:53:00.2055652Z 2022-05-18T04:53:00.2056282Z ---------------------------------------------------------------------- 2022-05-18T04:53:00.2056646Z Ran 1 test in 4.809s 2022-05-18T04:53:00.2056819Z 2022-05-18T04:53:00.2056915Z OK 2022-05-18T04:53:00.2057052Z 2022-05-18T04:53:00.2057189Z Generating XML reports... 2022-05-18T04:53:00.2100161Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045255.xml 2022-05-18T04:53:01.3970131Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:53:01.3985355Z 2022-05-18T04:53:01.3985696Z Running tests... 2022-05-18T04:53:01.3986247Z ---------------------------------------------------------------------- 2022-05-18T04:53:01.3994741Z test_ddp_checkpointing_twice_static_graph_use_reentrant_False (__main__.DistributedDataParallelTest) 2022-05-18T04:53:03.0186204Z Regardless of reentrant or non-reentrant checkpointing impl, ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:53:03.0541191Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 53337 2022-05-18T04:53:03.0646037Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 53338 2022-05-18T04:53:03.9660193Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:53:04.0049144Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:53:05.2776269Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp3e7ll2p7 2022-05-18T04:53:05.2777385Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp3e7ll2p7/_remote_module_non_scriptable.py 2022-05-18T04:53:05.3100066Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpwyz88yn3 2022-05-18T04:53:05.3102271Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpwyz88yn3/_remote_module_non_scriptable.py 2022-05-18T04:53:05.6791801Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:53:05.6792488Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:53:05.7097609Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:53:05.7098150Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:53:05.9729309Z ok (4.574s) 2022-05-18T04:53:05.9729504Z 2022-05-18T04:53:05.9730034Z ---------------------------------------------------------------------- 2022-05-18T04:53:05.9730735Z Ran 1 test in 4.574s 2022-05-18T04:53:05.9730928Z 2022-05-18T04:53:05.9731026Z OK 2022-05-18T04:53:05.9731164Z 2022-05-18T04:53:05.9731521Z Generating XML reports... 2022-05-18T04:53:05.9775516Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045301.xml 2022-05-18T04:53:07.1657683Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:53:07.1672911Z 2022-05-18T04:53:07.1673135Z Running tests... 2022-05-18T04:53:07.1673886Z ---------------------------------------------------------------------- 2022-05-18T04:53:07.1682184Z test_ddp_checkpointing_twice_static_graph_use_reentrant_True (__main__.DistributedDataParallelTest) 2022-05-18T04:53:08.8034611Z Regardless of reentrant or non-reentrant checkpointing impl, ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:53:08.8382050Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 53463 2022-05-18T04:53:08.8487650Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 53464 2022-05-18T04:53:09.7510395Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:53:09.7592969Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:53:11.0824109Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpvlmn1_fx 2022-05-18T04:53:11.0824731Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpvlmn1_fx/_remote_module_non_scriptable.py 2022-05-18T04:53:11.0944005Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpka0xuk4s 2022-05-18T04:53:11.0946861Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpka0xuk4s/_remote_module_non_scriptable.py 2022-05-18T04:53:11.4763783Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:53:11.4764342Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:53:11.5095204Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:53:11.5095723Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:53:11.8570607Z ok (4.689s) 2022-05-18T04:53:11.8570830Z 2022-05-18T04:53:11.8571466Z ---------------------------------------------------------------------- 2022-05-18T04:53:11.8571807Z Ran 1 test in 4.690s 2022-05-18T04:53:11.8571978Z 2022-05-18T04:53:11.8572081Z OK 2022-05-18T04:53:11.8572245Z 2022-05-18T04:53:11.8573230Z Generating XML reports... 2022-05-18T04:53:11.8615628Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045307.xml 2022-05-18T04:53:13.0517308Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:53:13.0532628Z 2022-05-18T04:53:13.0533125Z Running tests... 2022-05-18T04:53:13.0533619Z ---------------------------------------------------------------------- 2022-05-18T04:53:13.0546716Z test_ddp_checkpointing_twice_use_reentrant_False (__main__.DistributedDataParallelTest) 2022-05-18T04:53:14.7095743Z Checkpoitning twice fails for non-static graph with reentrant checkpoint ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:53:14.7443998Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 53589 2022-05-18T04:53:14.7552011Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 53590 2022-05-18T04:53:15.6607978Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:53:15.6641400Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:53:16.9889419Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpi6lhd5gm 2022-05-18T04:53:16.9890038Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpi6lhd5gm/_remote_module_non_scriptable.py 2022-05-18T04:53:17.0033172Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpbwskw3a6 2022-05-18T04:53:17.0035960Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpbwskw3a6/_remote_module_non_scriptable.py 2022-05-18T04:53:17.3627374Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:53:17.3643226Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:53:17.3904585Z [W reducer.cpp:1258] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 2022-05-18T04:53:17.3906186Z [W reducer.cpp:1258] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 2022-05-18T04:53:17.4276205Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:53:17.4281780Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:53:17.7637649Z ok (4.710s) 2022-05-18T04:53:17.7638092Z 2022-05-18T04:53:17.7638750Z ---------------------------------------------------------------------- 2022-05-18T04:53:17.7639355Z Ran 1 test in 4.710s 2022-05-18T04:53:17.7639667Z 2022-05-18T04:53:17.7639838Z OK 2022-05-18T04:53:17.7640095Z 2022-05-18T04:53:17.7640341Z Generating XML reports... 2022-05-18T04:53:17.7685719Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045313.xml 2022-05-18T04:53:18.9266673Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:53:18.9281974Z 2022-05-18T04:53:18.9282261Z Running tests... 2022-05-18T04:53:18.9282730Z ---------------------------------------------------------------------- 2022-05-18T04:53:18.9295195Z test_ddp_checkpointing_twice_use_reentrant_True (__main__.DistributedDataParallelTest) 2022-05-18T04:53:20.5774514Z Checkpoitning twice fails for non-static graph with reentrant checkpoint ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:53:20.6133232Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 53715 2022-05-18T04:53:20.6240820Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 53716 2022-05-18T04:53:21.5603987Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:53:21.5912337Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:53:22.8731180Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp8jsr0bx_ 2022-05-18T04:53:22.8731782Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp8jsr0bx_/_remote_module_non_scriptable.py 2022-05-18T04:53:22.8782239Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpsc49x6v_ 2022-05-18T04:53:22.8784926Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpsc49x6v_/_remote_module_non_scriptable.py 2022-05-18T04:53:23.6327482Z ok (4.704s) 2022-05-18T04:53:23.6327735Z 2022-05-18T04:53:23.6328160Z ---------------------------------------------------------------------- 2022-05-18T04:53:23.6328510Z Ran 1 test in 4.704s 2022-05-18T04:53:23.6328680Z 2022-05-18T04:53:23.6328777Z OK 2022-05-18T04:53:23.6328897Z 2022-05-18T04:53:23.6329355Z Generating XML reports... 2022-05-18T04:53:23.6374201Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045318.xml 2022-05-18T04:53:24.8323952Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:53:24.8338901Z 2022-05-18T04:53:24.8339328Z Running tests... 2022-05-18T04:53:24.8340132Z ---------------------------------------------------------------------- 2022-05-18T04:53:24.8348260Z test_ddp_checkpointing_twice_weight_sharing (__main__.DistributedDataParallelTest) 2022-05-18T04:53:26.4925027Z Checkpointing should work with static graph in the case of checkpointing ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:53:26.5284163Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 53841 2022-05-18T04:53:26.5392641Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 53842 2022-05-18T04:53:27.4419187Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:53:27.4864666Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:53:28.7448275Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmplvyfagq9 2022-05-18T04:53:28.7449492Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmplvyfagq9/_remote_module_non_scriptable.py 2022-05-18T04:53:28.7636448Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmps9ipc3bs 2022-05-18T04:53:28.7639268Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmps9ipc3bs/_remote_module_non_scriptable.py 2022-05-18T04:53:29.1343815Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:53:29.1344824Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:53:29.1647826Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:53:29.1648823Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:53:29.4474010Z ok (4.613s) 2022-05-18T04:53:29.4474204Z 2022-05-18T04:53:29.4474573Z ---------------------------------------------------------------------- 2022-05-18T04:53:29.4474927Z Ran 1 test in 4.613s 2022-05-18T04:53:29.4475098Z 2022-05-18T04:53:29.4475196Z OK 2022-05-18T04:53:29.4475333Z 2022-05-18T04:53:29.4475476Z Generating XML reports... 2022-05-18T04:53:29.4518532Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045324.xml 2022-05-18T04:53:30.6356879Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:53:30.6372231Z 2022-05-18T04:53:30.6372680Z Running tests... 2022-05-18T04:53:30.6373200Z ---------------------------------------------------------------------- 2022-05-18T04:53:30.6386034Z test_ddp_checkpointing_unused_params_use_reentrant_False (__main__.DistributedDataParallelTest) 2022-05-18T04:53:32.2611881Z With reentrant autograd checkpointing impl, DDP will fail when there are ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:53:32.2973767Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 53967 2022-05-18T04:53:32.3083619Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 53968 2022-05-18T04:53:33.2142388Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:53:33.2210919Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:53:34.5342163Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpepl1dosm 2022-05-18T04:53:34.5343314Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpepl1dosm/_remote_module_non_scriptable.py 2022-05-18T04:53:34.5787177Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpnylq689p 2022-05-18T04:53:34.5788702Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpnylq689p/_remote_module_non_scriptable.py 2022-05-18T04:53:34.9358346Z [W reducer.cpp:1258] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 2022-05-18T04:53:34.9440827Z [W reducer.cpp:1258] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 2022-05-18T04:53:34.9717296Z /opt/conda/lib/python3.7/site-packages/torch/nn/parallel/distributed.py:1737: UserWarning: You passed find_unused_parameters=true to DistributedDataParallel, `_set_static_graph` will detect unused parameters automatically, so you do not need to set find_unused_parameters=true, just be sure these unused parameters will not change during training loop while calling `_set_static_graph`. 2022-05-18T04:53:34.9718929Z "You passed find_unused_parameters=true to DistributedDataParallel, " 2022-05-18T04:53:34.9721110Z /opt/conda/lib/python3.7/site-packages/torch/nn/parallel/distributed.py:1737: UserWarning: You passed find_unused_parameters=true to DistributedDataParallel, `_set_static_graph` will detect unused parameters automatically, so you do not need to set find_unused_parameters=true, just be sure these unused parameters will not change during training loop while calling `_set_static_graph`. 2022-05-18T04:53:34.9722222Z "You passed find_unused_parameters=true to DistributedDataParallel, " 2022-05-18T04:53:34.9827553Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:53:34.9828509Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:53:35.0364044Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:53:35.0365039Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:53:35.3168869Z ok (4.679s) 2022-05-18T04:53:35.3169106Z 2022-05-18T04:53:35.3169484Z ---------------------------------------------------------------------- 2022-05-18T04:53:35.3169835Z Ran 1 test in 4.680s 2022-05-18T04:53:35.3170005Z 2022-05-18T04:53:35.3170106Z OK 2022-05-18T04:53:35.3170536Z 2022-05-18T04:53:35.3170817Z Generating XML reports... 2022-05-18T04:53:35.3215090Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045330.xml 2022-05-18T04:53:36.4510347Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:53:36.4525039Z 2022-05-18T04:53:36.4525556Z Running tests... 2022-05-18T04:53:36.4526113Z ---------------------------------------------------------------------- 2022-05-18T04:53:36.4538758Z test_ddp_checkpointing_unused_params_use_reentrant_True (__main__.DistributedDataParallelTest) 2022-05-18T04:53:38.0824491Z With reentrant autograd checkpointing impl, DDP will fail when there are ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:53:38.1185256Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 54093 2022-05-18T04:53:38.1295702Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 54094 2022-05-18T04:53:39.0329365Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:53:39.0562962Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:53:40.3638217Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpfjfpj5vk 2022-05-18T04:53:40.3639099Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpfjfpj5vk/_remote_module_non_scriptable.py 2022-05-18T04:53:40.3838050Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpp0zh6gvr 2022-05-18T04:53:40.3840323Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpp0zh6gvr/_remote_module_non_scriptable.py 2022-05-18T04:53:40.7638369Z /opt/conda/lib/python3.7/site-packages/torch/nn/parallel/distributed.py:1737: UserWarning: You passed find_unused_parameters=true to DistributedDataParallel, `_set_static_graph` will detect unused parameters automatically, so you do not need to set find_unused_parameters=true, just be sure these unused parameters will not change during training loop while calling `_set_static_graph`. 2022-05-18T04:53:40.7639302Z "You passed find_unused_parameters=true to DistributedDataParallel, " 2022-05-18T04:53:40.7640459Z /opt/conda/lib/python3.7/site-packages/torch/nn/parallel/distributed.py:1737: UserWarning: You passed find_unused_parameters=true to DistributedDataParallel, `_set_static_graph` will detect unused parameters automatically, so you do not need to set find_unused_parameters=true, just be sure these unused parameters will not change during training loop while calling `_set_static_graph`. 2022-05-18T04:53:40.7641287Z "You passed find_unused_parameters=true to DistributedDataParallel, " 2022-05-18T04:53:40.7765271Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:53:40.7765956Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:53:40.8171665Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:53:40.8174493Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:53:41.1380455Z ok (4.685s) 2022-05-18T04:53:41.1380681Z 2022-05-18T04:53:41.1381153Z ---------------------------------------------------------------------- 2022-05-18T04:53:41.1381655Z Ran 1 test in 4.685s 2022-05-18T04:53:41.1381809Z 2022-05-18T04:53:41.1381916Z OK 2022-05-18T04:53:41.1382080Z 2022-05-18T04:53:41.1382225Z Generating XML reports... 2022-05-18T04:53:41.1428030Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045336.xml 2022-05-18T04:53:42.3081559Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:53:42.3098498Z 2022-05-18T04:53:42.3098995Z Running tests... 2022-05-18T04:53:42.3099885Z ---------------------------------------------------------------------- 2022-05-18T04:53:42.3116070Z test_ddp_checkpointing_weight_sharing_use_reentrant_False (__main__.DistributedDataParallelTest) 2022-05-18T04:53:43.9703141Z Test that checkpointing with weight sharing works. ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:53:44.0068422Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 54219 2022-05-18T04:53:44.0178296Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 54220 2022-05-18T04:53:44.9234794Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:53:44.9252307Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:53:46.2316947Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp95yeth8x 2022-05-18T04:53:46.2317804Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp95yeth8x/_remote_module_non_scriptable.py 2022-05-18T04:53:46.2645026Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpgsklxqnx 2022-05-18T04:53:46.2647450Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpgsklxqnx/_remote_module_non_scriptable.py 2022-05-18T04:53:46.6329098Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:53:46.6329640Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:53:46.6676137Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:53:46.6676649Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:53:47.0264749Z ok (4.716s) 2022-05-18T04:53:47.0265158Z 2022-05-18T04:53:47.0265655Z ---------------------------------------------------------------------- 2022-05-18T04:53:47.0266060Z Ran 1 test in 4.717s 2022-05-18T04:53:47.0266236Z 2022-05-18T04:53:47.0266346Z OK 2022-05-18T04:53:47.0266484Z 2022-05-18T04:53:47.0266627Z Generating XML reports... 2022-05-18T04:53:47.0311166Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045342.xml 2022-05-18T04:53:48.2005294Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:53:48.2020418Z 2022-05-18T04:53:48.2020945Z Running tests... 2022-05-18T04:53:48.2021447Z ---------------------------------------------------------------------- 2022-05-18T04:53:48.2035679Z test_ddp_checkpointing_weight_sharing_use_reentrant_True (__main__.DistributedDataParallelTest) 2022-05-18T04:53:49.8203911Z Test that checkpointing with weight sharing works. ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:53:49.8556983Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 54345 2022-05-18T04:53:49.8662353Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 54346 2022-05-18T04:53:50.8303748Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:53:50.8324982Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:53:52.1559597Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpqdr1h3np 2022-05-18T04:53:52.1560820Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpqdr1h3np/_remote_module_non_scriptable.py 2022-05-18T04:53:52.1731788Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpm9pj_eud 2022-05-18T04:53:52.1735106Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpm9pj_eud/_remote_module_non_scriptable.py 2022-05-18T04:53:52.5392300Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:53:52.5392841Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:53:52.5687881Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:53:52.5688375Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:53:52.5892147Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:53:52.5892647Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:53:52.6183859Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:53:52.6184355Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:53:52.8747045Z ok (4.672s) 2022-05-18T04:53:52.8747266Z 2022-05-18T04:53:52.8747650Z ---------------------------------------------------------------------- 2022-05-18T04:53:52.8748000Z Ran 1 test in 4.673s 2022-05-18T04:53:52.8748174Z 2022-05-18T04:53:52.8748273Z OK 2022-05-18T04:53:52.8748417Z 2022-05-18T04:53:52.8748555Z Generating XML reports... 2022-05-18T04:53:52.8790845Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045348.xml 2022-05-18T04:53:54.0584828Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:53:54.0599528Z 2022-05-18T04:53:54.0599899Z Running tests... 2022-05-18T04:53:54.0600352Z ---------------------------------------------------------------------- 2022-05-18T04:53:55.7080022Z test_ddp_comm_hook_allreduce_hook_nccl (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:53:55.7440596Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 54471 2022-05-18T04:53:55.7548736Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 54472 2022-05-18T04:53:56.6600139Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:53:56.6946928Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:53:57.9806278Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpzl84shq0 2022-05-18T04:53:57.9807297Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpzl84shq0/_remote_module_non_scriptable.py 2022-05-18T04:53:57.9957648Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpkr3u00x7 2022-05-18T04:53:57.9961086Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpkr3u00x7/_remote_module_non_scriptable.py 2022-05-18T04:53:59.3649922Z ok (5.305s) 2022-05-18T04:53:59.3650718Z 2022-05-18T04:53:59.3651407Z ---------------------------------------------------------------------- 2022-05-18T04:53:59.3652041Z Ran 1 test in 5.305s 2022-05-18T04:53:59.3652352Z 2022-05-18T04:53:59.3652534Z OK 2022-05-18T04:53:59.3652783Z 2022-05-18T04:53:59.3653033Z Generating XML reports... 2022-05-18T04:53:59.3697088Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045354.xml 2022-05-18T04:54:00.5261044Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:54:00.5276095Z 2022-05-18T04:54:00.5276243Z Running tests... 2022-05-18T04:54:00.5277103Z ---------------------------------------------------------------------- 2022-05-18T04:54:02.1844603Z test_ddp_comm_hook_allreduce_hook_nccl_grad_is_view (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:54:02.2209835Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 54598 2022-05-18T04:54:02.2320986Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 54599 2022-05-18T04:54:03.1597966Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:54:03.1625018Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:54:04.4855575Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmphdk6b8xw 2022-05-18T04:54:04.4857017Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmphdk6b8xw/_remote_module_non_scriptable.py 2022-05-18T04:54:04.4876786Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp1b_vsog2 2022-05-18T04:54:04.4880160Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp1b_vsog2/_remote_module_non_scriptable.py 2022-05-18T04:54:05.8416791Z ok (5.314s) 2022-05-18T04:54:05.8418361Z 2022-05-18T04:54:05.8419278Z ---------------------------------------------------------------------- 2022-05-18T04:54:05.8419707Z Ran 1 test in 5.314s 2022-05-18T04:54:05.8419878Z 2022-05-18T04:54:05.8419986Z OK 2022-05-18T04:54:05.8420124Z 2022-05-18T04:54:05.8420259Z Generating XML reports... 2022-05-18T04:54:05.8461703Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045400.xml 2022-05-18T04:54:07.0377212Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:54:07.0392238Z 2022-05-18T04:54:07.0392815Z Running tests... 2022-05-18T04:54:07.0393697Z ---------------------------------------------------------------------- 2022-05-18T04:54:08.6778929Z test_ddp_comm_hook_allreduce_hook_nccl_static_graph (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:54:08.7128879Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 54725 2022-05-18T04:54:08.7237243Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 54726 2022-05-18T04:54:09.6721409Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:54:09.7209032Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:54:10.9438427Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpo7rqhds4 2022-05-18T04:54:10.9439746Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpo7rqhds4/_remote_module_non_scriptable.py 2022-05-18T04:54:11.0283033Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpl8l7nerh 2022-05-18T04:54:11.0284461Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpl8l7nerh/_remote_module_non_scriptable.py 2022-05-18T04:54:12.3332151Z ok (5.294s) 2022-05-18T04:54:12.3332362Z 2022-05-18T04:54:12.3332756Z ---------------------------------------------------------------------- 2022-05-18T04:54:12.3333109Z Ran 1 test in 5.294s 2022-05-18T04:54:12.3333256Z 2022-05-18T04:54:12.3333366Z OK 2022-05-18T04:54:12.3333503Z 2022-05-18T04:54:12.3333637Z Generating XML reports... 2022-05-18T04:54:12.3377591Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045407.xml 2022-05-18T04:54:13.5371424Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:54:13.5386943Z 2022-05-18T04:54:13.5387267Z Running tests... 2022-05-18T04:54:13.5387693Z ---------------------------------------------------------------------- 2022-05-18T04:54:13.5402225Z test_ddp_comm_hook_allreduce_with_then_hook_nccl (__main__.DistributedDataParallelTest) 2022-05-18T04:54:15.1894859Z This unit test verifies whether a DDP communication hook that calls allreduce and then ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:54:15.2260050Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 54852 2022-05-18T04:54:15.2369439Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 54853 2022-05-18T04:54:16.1317208Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:54:16.1800475Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:54:17.4200831Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpt6fqj0a7 2022-05-18T04:54:17.4202792Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpt6fqj0a7/_remote_module_non_scriptable.py 2022-05-18T04:54:17.5254067Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp8ljbpaic 2022-05-18T04:54:17.5256113Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp8ljbpaic/_remote_module_non_scriptable.py 2022-05-18T04:54:18.8469023Z ok (5.308s) 2022-05-18T04:54:18.8469280Z 2022-05-18T04:54:18.8469872Z ---------------------------------------------------------------------- 2022-05-18T04:54:18.8470226Z Ran 1 test in 5.308s 2022-05-18T04:54:18.8470392Z 2022-05-18T04:54:18.8470508Z OK 2022-05-18T04:54:18.8470648Z 2022-05-18T04:54:18.8470792Z Generating XML reports... 2022-05-18T04:54:18.8516971Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045413.xml 2022-05-18T04:54:20.0491012Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:54:20.0506281Z 2022-05-18T04:54:20.0506837Z Running tests... 2022-05-18T04:54:20.0507370Z ---------------------------------------------------------------------- 2022-05-18T04:54:20.0516428Z test_ddp_comm_hook_future_passing_gpu_nccl (__main__.DistributedDataParallelTest) 2022-05-18T04:54:21.7041226Z This unit test verifies whether the Future object is passed properly using nccl backend. ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:54:21.7403573Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 54979 2022-05-18T04:54:21.7514119Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 54980 2022-05-18T04:54:22.6567028Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:54:22.6875263Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:54:23.9600730Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpf3hv880r 2022-05-18T04:54:23.9602059Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpf3hv880r/_remote_module_non_scriptable.py 2022-05-18T04:54:24.0170698Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpkv2o9c2c 2022-05-18T04:54:24.0172449Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpkv2o9c2c/_remote_module_non_scriptable.py 2022-05-18T04:54:25.3611077Z ok (5.310s) 2022-05-18T04:54:25.3611319Z 2022-05-18T04:54:25.3611911Z ---------------------------------------------------------------------- 2022-05-18T04:54:25.3612278Z Ran 1 test in 5.310s 2022-05-18T04:54:25.3612445Z 2022-05-18T04:54:25.3612560Z OK 2022-05-18T04:54:25.3612696Z 2022-05-18T04:54:25.3612839Z Generating XML reports... 2022-05-18T04:54:25.3655832Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045420.xml 2022-05-18T04:54:26.5623987Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:54:26.5638650Z 2022-05-18T04:54:26.5639062Z Running tests... 2022-05-18T04:54:26.5639559Z ---------------------------------------------------------------------- 2022-05-18T04:54:28.2183369Z test_ddp_multi_device_module_config (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:54:28.2544141Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 55106 2022-05-18T04:54:28.2653070Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 55107 2022-05-18T04:54:29.0994817Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:54:29.1182933Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:54:29.2695004Z skip: Need at least 4 CUDA devices (2.705s) 2022-05-18T04:54:29.2695366Z 2022-05-18T04:54:29.2695739Z ---------------------------------------------------------------------- 2022-05-18T04:54:29.2696092Z Ran 1 test in 2.706s 2022-05-18T04:54:29.2696261Z 2022-05-18T04:54:29.2696373Z OK (skipped=1) 2022-05-18T04:54:29.2696531Z 2022-05-18T04:54:29.2696661Z Generating XML reports... 2022-05-18T04:54:29.2752761Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045426.xml 2022-05-18T04:54:30.4583797Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:54:30.4598733Z 2022-05-18T04:54:30.4598964Z Running tests... 2022-05-18T04:54:30.4599430Z ---------------------------------------------------------------------- 2022-05-18T04:54:32.1202177Z test_ddp_weight_sharing (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:54:32.1562855Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 55215 2022-05-18T04:54:32.1672691Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 55216 2022-05-18T04:54:33.0689701Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:54:33.0708911Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:54:34.4086914Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpfb777_vo 2022-05-18T04:54:34.4087517Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpfb777_vo/_remote_module_non_scriptable.py 2022-05-18T04:54:34.4552070Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp4wh97m27 2022-05-18T04:54:34.4553926Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp4wh97m27/_remote_module_non_scriptable.py 2022-05-18T04:54:35.5027264Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:54:35.5041931Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:54:35.5593232Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:54:35.5608589Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:54:35.6146031Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:54:35.6162236Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:54:35.6695276Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:54:35.6711121Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:54:36.0778634Z ok (5.618s) 2022-05-18T04:54:36.0779033Z 2022-05-18T04:54:36.0779673Z ---------------------------------------------------------------------- 2022-05-18T04:54:36.0780312Z Ran 1 test in 5.618s 2022-05-18T04:54:36.0780628Z 2022-05-18T04:54:36.0780803Z OK 2022-05-18T04:54:36.0781059Z 2022-05-18T04:54:36.0781263Z Generating XML reports... 2022-05-18T04:54:36.0826036Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045430.xml 2022-05-18T04:54:37.2776605Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:54:37.2791632Z 2022-05-18T04:54:37.2791873Z Running tests... 2022-05-18T04:54:37.2792299Z ---------------------------------------------------------------------- 2022-05-18T04:54:38.9272658Z test_ddp_with_lazy_parameters (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:54:38.9631292Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 55342 2022-05-18T04:54:38.9740565Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 55343 2022-05-18T04:54:39.9110666Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:54:39.9118640Z /opt/conda/lib/python3.7/site-packages/torch/nn/modules/lazy.py:178: UserWarning: Lazy modules are a new feature under heavy development so changes to the API or functionality can happen at any moment. 2022-05-18T04:54:39.9119325Z warnings.warn('Lazy modules are a new feature under heavy development ' 2022-05-18T04:54:39.9209529Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpxuzps7bs 2022-05-18T04:54:39.9212326Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpxuzps7bs/_remote_module_non_scriptable.py 2022-05-18T04:54:39.9345586Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:54:39.9353425Z /opt/conda/lib/python3.7/site-packages/torch/nn/modules/lazy.py:178: UserWarning: Lazy modules are a new feature under heavy development so changes to the API or functionality can happen at any moment. 2022-05-18T04:54:39.9354102Z warnings.warn('Lazy modules are a new feature under heavy development ' 2022-05-18T04:54:39.9439451Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpowtp_w4i 2022-05-18T04:54:39.9441728Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpowtp_w4i/_remote_module_non_scriptable.py 2022-05-18T04:54:40.1784469Z ok (2.899s) 2022-05-18T04:54:40.1784827Z 2022-05-18T04:54:40.1785588Z ---------------------------------------------------------------------- 2022-05-18T04:54:40.1785946Z Ran 1 test in 2.899s 2022-05-18T04:54:40.1786096Z 2022-05-18T04:54:40.1786196Z OK 2022-05-18T04:54:40.1786343Z 2022-05-18T04:54:40.1786478Z Generating XML reports... 2022-05-18T04:54:40.1829603Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045437.xml 2022-05-18T04:54:41.3518021Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:54:41.3532972Z 2022-05-18T04:54:41.3533351Z Running tests... 2022-05-18T04:54:41.3533898Z ---------------------------------------------------------------------- 2022-05-18T04:54:42.9934368Z test_default_ddp_comm_hooks_nccl (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:54:43.0285629Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 55455 2022-05-18T04:54:43.0393458Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 55456 2022-05-18T04:54:43.9003555Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:54:43.9484014Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:54:45.1889873Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp49j8q_ua 2022-05-18T04:54:45.1891326Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp49j8q_ua/_remote_module_non_scriptable.py 2022-05-18T04:54:45.2322912Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpf63bd3e0 2022-05-18T04:54:45.2324995Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpf63bd3e0/_remote_module_non_scriptable.py 2022-05-18T04:54:46.5501527Z ok (5.196s) 2022-05-18T04:54:46.5501768Z 2022-05-18T04:54:46.5502191Z ---------------------------------------------------------------------- 2022-05-18T04:54:46.5502546Z Ran 1 test in 5.197s 2022-05-18T04:54:46.5502738Z 2022-05-18T04:54:46.5502816Z OK 2022-05-18T04:54:46.5502954Z 2022-05-18T04:54:46.5503089Z Generating XML reports... 2022-05-18T04:54:46.5546325Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045441.xml 2022-05-18T04:54:47.7226076Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:54:47.7240270Z 2022-05-18T04:54:47.7240506Z Running tests... 2022-05-18T04:54:47.7241110Z ---------------------------------------------------------------------- 2022-05-18T04:54:49.3418798Z test_default_ddp_comm_hooks_nccl_is_view (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:54:49.3772910Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 55582 2022-05-18T04:54:49.3880791Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 55583 2022-05-18T04:54:50.3152375Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:54:50.3366669Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:54:51.6275894Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpdf8uex4n 2022-05-18T04:54:51.6277199Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpg70em0wp 2022-05-18T04:54:51.6277912Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpdf8uex4n/_remote_module_non_scriptable.py 2022-05-18T04:54:51.6280136Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpg70em0wp/_remote_module_non_scriptable.py 2022-05-18T04:54:52.9980383Z ok (5.274s) 2022-05-18T04:54:52.9980727Z 2022-05-18T04:54:52.9981264Z ---------------------------------------------------------------------- 2022-05-18T04:54:52.9981600Z Ran 1 test in 5.274s 2022-05-18T04:54:52.9981767Z 2022-05-18T04:54:52.9981869Z OK 2022-05-18T04:54:52.9982006Z 2022-05-18T04:54:52.9982472Z Generating XML reports... 2022-05-18T04:54:53.0025358Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045447.xml 2022-05-18T04:54:54.1815930Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:54:54.1830503Z 2022-05-18T04:54:54.1830839Z Running tests... 2022-05-18T04:54:54.1831558Z ---------------------------------------------------------------------- 2022-05-18T04:54:55.8267105Z test_failure_recovery (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:54:55.8619128Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 55709 2022-05-18T04:54:55.8728048Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 55710 2022-05-18T04:54:56.8036054Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:54:56.8330389Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:54:58.1197073Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpdibfjf3h 2022-05-18T04:54:58.1197737Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpdibfjf3h/_remote_module_non_scriptable.py 2022-05-18T04:54:58.1249768Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmppqb9_935 2022-05-18T04:54:58.1252419Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmppqb9_935/_remote_module_non_scriptable.py 2022-05-18T04:54:59.4600120Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:54:59.4600670Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:54:59.5092405Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:54:59.5092940Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:54:59.8837274Z ok (5.700s) 2022-05-18T04:54:59.8837516Z 2022-05-18T04:54:59.8837917Z ---------------------------------------------------------------------- 2022-05-18T04:54:59.8838268Z Ran 1 test in 5.701s 2022-05-18T04:54:59.8838417Z 2022-05-18T04:54:59.8838525Z OK 2022-05-18T04:54:59.8838665Z 2022-05-18T04:54:59.8838803Z Generating XML reports... 2022-05-18T04:54:59.8882145Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045454.xml 2022-05-18T04:55:01.0558453Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:55:01.0573163Z 2022-05-18T04:55:01.0573448Z Running tests... 2022-05-18T04:55:01.0573878Z ---------------------------------------------------------------------- 2022-05-18T04:55:02.6730910Z test_find_unused_parameters_kwarg_debug_detail (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:55:02.7090062Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 55847 2022-05-18T04:55:02.7198196Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 55848 2022-05-18T04:55:03.6192094Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:55:03.6442903Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:55:03.6562224Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:55:03.6562743Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:55:03.6563539Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:55:03.6564233Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:55:04.9603623Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpsy1s6mi8 2022-05-18T04:55:04.9604596Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpsy1s6mi8/_remote_module_non_scriptable.py 2022-05-18T04:55:04.9905035Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmphgs34i2x 2022-05-18T04:55:04.9908109Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmphgs34i2x/_remote_module_non_scriptable.py 2022-05-18T04:55:05.6281055Z ok (4.570s) 2022-05-18T04:55:05.6281421Z 2022-05-18T04:55:05.6282157Z ---------------------------------------------------------------------- 2022-05-18T04:55:05.6282755Z Ran 1 test in 4.571s 2022-05-18T04:55:05.6282923Z 2022-05-18T04:55:05.6283017Z OK 2022-05-18T04:55:05.6283155Z 2022-05-18T04:55:05.6283289Z Generating XML reports... 2022-05-18T04:55:05.6325658Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045501.xml 2022-05-18T04:55:06.8139998Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:55:06.8154478Z 2022-05-18T04:55:06.8154747Z Running tests... 2022-05-18T04:55:06.8155176Z ---------------------------------------------------------------------- 2022-05-18T04:55:08.4664267Z test_find_unused_parameters_kwarg_debug_info (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:55:08.5024220Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 55979 2022-05-18T04:55:08.5133463Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 55980 2022-05-18T04:55:09.4481091Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:55:09.4491215Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:55:09.4655742Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:55:09.4667090Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:55:09.4668104Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:55:09.4695571Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:55:10.7710694Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp3ujffurv 2022-05-18T04:55:10.7713860Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp3ujffurv/_remote_module_non_scriptable.py 2022-05-18T04:55:10.7924224Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp8uczj1yf 2022-05-18T04:55:10.7926938Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp8uczj1yf/_remote_module_non_scriptable.py 2022-05-18T04:55:11.4216996Z ok (4.606s) 2022-05-18T04:55:11.4217234Z 2022-05-18T04:55:11.4217623Z ---------------------------------------------------------------------- 2022-05-18T04:55:11.4217996Z Ran 1 test in 4.606s 2022-05-18T04:55:11.4218165Z 2022-05-18T04:55:11.4218243Z OK 2022-05-18T04:55:11.4218382Z 2022-05-18T04:55:11.4218518Z Generating XML reports... 2022-05-18T04:55:11.4261834Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045506.xml 2022-05-18T04:55:12.5861188Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:55:12.5877382Z 2022-05-18T04:55:12.5877523Z Running tests... 2022-05-18T04:55:12.5878123Z ---------------------------------------------------------------------- 2022-05-18T04:55:14.2340862Z test_find_unused_parameters_kwarg_debug_off (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:55:14.2701742Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 56105 2022-05-18T04:55:14.2810092Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 56106 2022-05-18T04:55:15.2069732Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:55:15.2079268Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:55:15.2260358Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:55:15.2271800Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:55:15.2272890Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:55:15.2283935Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:55:16.5349468Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp7k41sicg 2022-05-18T04:55:16.5350092Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp7k41sicg/_remote_module_non_scriptable.py 2022-05-18T04:55:16.5734745Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmphjws9aua 2022-05-18T04:55:16.5737702Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmphjws9aua/_remote_module_non_scriptable.py 2022-05-18T04:55:17.2895761Z ok (4.701s) 2022-05-18T04:55:17.2896005Z 2022-05-18T04:55:17.2896429Z ---------------------------------------------------------------------- 2022-05-18T04:55:17.2896762Z Ran 1 test in 4.702s 2022-05-18T04:55:17.2896931Z 2022-05-18T04:55:17.2897032Z OK 2022-05-18T04:55:17.2897177Z 2022-05-18T04:55:17.2897310Z Generating XML reports... 2022-05-18T04:55:17.2949118Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045512.xml 2022-05-18T04:55:18.4705289Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:55:18.4720103Z 2022-05-18T04:55:18.4720405Z Running tests... 2022-05-18T04:55:18.4720837Z ---------------------------------------------------------------------- 2022-05-18T04:55:20.1088348Z test_find_unused_parameters_kwarg_grad_is_view_debug_detail (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:55:20.1440401Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 56231 2022-05-18T04:55:20.1549691Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 56232 2022-05-18T04:55:21.0516419Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:55:21.0572280Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:55:21.0692832Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:55:21.0693344Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:55:21.0694154Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:55:21.0694854Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:55:22.3840632Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpoxfjavi8 2022-05-18T04:55:22.3841445Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpoxfjavi8/_remote_module_non_scriptable.py 2022-05-18T04:55:22.4080663Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmphkjrqhya 2022-05-18T04:55:22.4083856Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmphkjrqhya/_remote_module_non_scriptable.py 2022-05-18T04:55:23.0631023Z ok (4.591s) 2022-05-18T04:55:23.0631268Z 2022-05-18T04:55:23.0631672Z ---------------------------------------------------------------------- 2022-05-18T04:55:23.0632007Z Ran 1 test in 4.591s 2022-05-18T04:55:23.0632458Z 2022-05-18T04:55:23.0632556Z OK 2022-05-18T04:55:23.0632703Z 2022-05-18T04:55:23.0632842Z Generating XML reports... 2022-05-18T04:55:23.0675130Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045518.xml 2022-05-18T04:55:24.2477238Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:55:24.2492379Z 2022-05-18T04:55:24.2492541Z Running tests... 2022-05-18T04:55:24.2492981Z ---------------------------------------------------------------------- 2022-05-18T04:55:25.8779417Z test_find_unused_parameters_kwarg_grad_is_view_debug_info (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:55:25.9139448Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 56363 2022-05-18T04:55:25.9248155Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 56364 2022-05-18T04:55:26.8510957Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:55:26.8522121Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:55:26.8745855Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:55:26.8758540Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:55:26.8759321Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:55:26.8828883Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:55:28.2178418Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpgsezaion 2022-05-18T04:55:28.2179254Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpgsezaion/_remote_module_non_scriptable.py 2022-05-18T04:55:28.2364199Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpd9em311o 2022-05-18T04:55:28.2367298Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpd9em311o/_remote_module_non_scriptable.py 2022-05-18T04:55:28.9333278Z ok (4.684s) 2022-05-18T04:55:28.9333505Z 2022-05-18T04:55:28.9334320Z ---------------------------------------------------------------------- 2022-05-18T04:55:28.9334699Z Ran 1 test in 4.684s 2022-05-18T04:55:28.9334853Z 2022-05-18T04:55:28.9334962Z OK 2022-05-18T04:55:28.9335100Z 2022-05-18T04:55:28.9335244Z Generating XML reports... 2022-05-18T04:55:28.9382350Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045524.xml 2022-05-18T04:55:30.1245530Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:55:30.1261231Z 2022-05-18T04:55:30.1261709Z Running tests... 2022-05-18T04:55:30.1262222Z ---------------------------------------------------------------------- 2022-05-18T04:55:31.7668459Z test_find_unused_parameters_kwarg_grad_is_view_debug_off (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:55:31.8026680Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 56489 2022-05-18T04:55:31.8135966Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 56490 2022-05-18T04:55:32.7653988Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:55:32.7665630Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:55:32.7700035Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:55:32.7712328Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:55:32.7713932Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:55:32.7769155Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:55:34.0792161Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpz1t54h4_ 2022-05-18T04:55:34.0793381Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpz1t54h4_/_remote_module_non_scriptable.py 2022-05-18T04:55:34.0999008Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpy1f1kc2g 2022-05-18T04:55:34.1001859Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpy1f1kc2g/_remote_module_non_scriptable.py 2022-05-18T04:55:34.8223944Z ok (4.696s) 2022-05-18T04:55:34.8224427Z 2022-05-18T04:55:34.8225224Z ---------------------------------------------------------------------- 2022-05-18T04:55:34.8225748Z Ran 1 test in 4.696s 2022-05-18T04:55:34.8225919Z 2022-05-18T04:55:34.8226018Z OK 2022-05-18T04:55:34.8226160Z 2022-05-18T04:55:34.8226299Z Generating XML reports... 2022-05-18T04:55:34.8270323Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045530.xml 2022-05-18T04:55:35.9960437Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:55:35.9975195Z 2022-05-18T04:55:35.9975502Z Running tests... 2022-05-18T04:55:35.9975967Z ---------------------------------------------------------------------- 2022-05-18T04:55:37.6003474Z test_fp16 (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:55:37.6353858Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 56615 2022-05-18T04:55:37.6463910Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 56616 2022-05-18T04:55:38.5815956Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:55:38.5944070Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:55:39.9005396Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpzlpglctn 2022-05-18T04:55:39.9006000Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpzlpglctn/_remote_module_non_scriptable.py 2022-05-18T04:55:39.9281064Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmploxfjk2u 2022-05-18T04:55:39.9283408Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmploxfjk2u/_remote_module_non_scriptable.py 2022-05-18T04:55:41.6570784Z ok (5.659s) 2022-05-18T04:55:41.6571057Z 2022-05-18T04:55:41.6571694Z ---------------------------------------------------------------------- 2022-05-18T04:55:41.6572025Z Ran 1 test in 5.659s 2022-05-18T04:55:41.6572196Z 2022-05-18T04:55:41.6572296Z OK 2022-05-18T04:55:41.6572435Z 2022-05-18T04:55:41.6572574Z Generating XML reports... 2022-05-18T04:55:41.6615594Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045535.xml 2022-05-18T04:55:42.8480312Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:55:42.8495488Z 2022-05-18T04:55:42.8495636Z Running tests... 2022-05-18T04:55:42.8496729Z ---------------------------------------------------------------------- 2022-05-18T04:55:44.4997361Z test_fp16_compress_wrapper_is_view (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:55:44.5357316Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 56742 2022-05-18T04:55:44.5466181Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 56743 2022-05-18T04:55:45.4821269Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:55:45.4822407Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-05-18T04:55:45.5015277Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:55:45.5018025Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-05-18T04:55:46.8149749Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpxl0z29pj 2022-05-18T04:55:46.8150362Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpxl0z29pj/_remote_module_non_scriptable.py 2022-05-18T04:55:46.8200754Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp8e5qjqnr 2022-05-18T04:55:46.8203378Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp8e5qjqnr/_remote_module_non_scriptable.py 2022-05-18T04:55:48.1565832Z ok (5.307s) 2022-05-18T04:55:48.1566084Z 2022-05-18T04:55:48.1566502Z ---------------------------------------------------------------------- 2022-05-18T04:55:48.1566836Z Ran 1 test in 5.307s 2022-05-18T04:55:48.1567007Z 2022-05-18T04:55:48.1567101Z OK 2022-05-18T04:55:48.1567253Z 2022-05-18T04:55:48.1567395Z Generating XML reports... 2022-05-18T04:55:48.1609265Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045542.xml 2022-05-18T04:55:49.3346256Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:55:49.3360072Z 2022-05-18T04:55:49.3360309Z Running tests... 2022-05-18T04:55:49.3360757Z ---------------------------------------------------------------------- 2022-05-18T04:55:50.9378856Z test_fp16_compress_wrapper_nccl (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:55:50.9735206Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 56869 2022-05-18T04:55:50.9846659Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 56870 2022-05-18T04:55:51.9065405Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:55:51.9066305Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-05-18T04:55:51.9161585Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:55:51.9164373Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-05-18T04:55:53.2372027Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp4n34w4hu 2022-05-18T04:55:53.2372659Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp4n34w4hu/_remote_module_non_scriptable.py 2022-05-18T04:55:53.2582298Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp7tj970zs 2022-05-18T04:55:53.2584862Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp7tj970zs/_remote_module_non_scriptable.py 2022-05-18T04:55:54.6945926Z ok (5.358s) 2022-05-18T04:55:54.6946170Z 2022-05-18T04:55:54.6946562Z ---------------------------------------------------------------------- 2022-05-18T04:55:54.6946918Z Ran 1 test in 5.358s 2022-05-18T04:55:54.6947437Z 2022-05-18T04:55:54.6947535Z OK 2022-05-18T04:55:54.6947682Z 2022-05-18T04:55:54.6947831Z Generating XML reports... 2022-05-18T04:55:54.6992564Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045549.xml 2022-05-18T04:55:55.8867870Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:55:55.8883102Z 2022-05-18T04:55:55.8883844Z Running tests... 2022-05-18T04:55:55.8884380Z ---------------------------------------------------------------------- 2022-05-18T04:55:57.5340640Z test_fp16_grad_is_view (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:55:57.5702218Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 56996 2022-05-18T04:55:57.5811900Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 56997 2022-05-18T04:55:58.4750242Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:55:58.4894431Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:55:59.8053416Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp421a2a4y 2022-05-18T04:55:59.8054612Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp421a2a4y/_remote_module_non_scriptable.py 2022-05-18T04:55:59.8481501Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpt1996h83 2022-05-18T04:55:59.8482564Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpt1996h83/_remote_module_non_scriptable.py 2022-05-18T04:56:01.4913931Z ok (5.603s) 2022-05-18T04:56:01.4914183Z 2022-05-18T04:56:01.4914601Z ---------------------------------------------------------------------- 2022-05-18T04:56:01.4914952Z Ran 1 test in 5.603s 2022-05-18T04:56:01.4915125Z 2022-05-18T04:56:01.4915223Z OK 2022-05-18T04:56:01.4918890Z 2022-05-18T04:56:01.4919414Z Generating XML reports... 2022-05-18T04:56:01.4959818Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045555.xml 2022-05-18T04:56:02.6881733Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:56:02.6897666Z 2022-05-18T04:56:02.6897965Z Running tests... 2022-05-18T04:56:02.6898599Z ---------------------------------------------------------------------- 2022-05-18T04:56:04.3439391Z test_grad_layout_1devicemodule_1replicaperprocess (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:56:04.3800597Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 57123 2022-05-18T04:56:04.3910030Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 57124 2022-05-18T04:56:05.3013355Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:56:05.3031933Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:56:06.6337678Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpm_03x3zo 2022-05-18T04:56:06.6338540Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpm_03x3zo/_remote_module_non_scriptable.py 2022-05-18T04:56:06.6564822Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp26vdy7yw 2022-05-18T04:56:06.6567154Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp26vdy7yw/_remote_module_non_scriptable.py 2022-05-18T04:56:08.8194927Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:56:08.8195508Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:56:08.8521128Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:56:08.8521674Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:56:08.8944622Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:56:08.8945158Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:56:08.9287103Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:56:08.9287895Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:56:08.9624841Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:56:08.9625353Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:56:08.9972577Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:56:08.9973071Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:56:09.0310811Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:56:09.0311314Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:56:09.0652227Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:56:09.0652718Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:56:09.0992859Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:56:09.0993349Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:56:09.1346811Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:56:09.1347299Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:56:09.1692156Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:56:09.1692651Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:56:09.2046357Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:56:09.2046851Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:56:09.2383509Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:56:09.2384000Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:56:09.2719238Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:56:09.2719737Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:56:09.3053132Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:56:09.3053621Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:56:09.3405843Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:56:09.3406337Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:56:09.3746117Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:56:09.3746606Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:56:09.4093932Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:56:09.4094421Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:56:10.0051395Z ok (7.315s) 2022-05-18T04:56:10.0051719Z 2022-05-18T04:56:10.0052296Z ---------------------------------------------------------------------- 2022-05-18T04:56:10.0052710Z Ran 1 test in 7.315s 2022-05-18T04:56:10.0052860Z 2022-05-18T04:56:10.0052957Z OK 2022-05-18T04:56:10.0053098Z 2022-05-18T04:56:10.0053232Z Generating XML reports... 2022-05-18T04:56:10.0096611Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045602.xml 2022-05-18T04:56:11.1754870Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:56:11.1769218Z 2022-05-18T04:56:11.1769444Z Running tests... 2022-05-18T04:56:11.1770154Z ---------------------------------------------------------------------- 2022-05-18T04:56:12.7834508Z test_grad_layout_2devicemodule (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:56:12.8186550Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 57250 2022-05-18T04:56:12.8297517Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 57251 2022-05-18T04:56:13.7849176Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:56:13.8190905Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:56:14.0340915Z skip: Need at least 4 CUDA devices (2.857s) 2022-05-18T04:56:14.0341149Z 2022-05-18T04:56:14.0341544Z ---------------------------------------------------------------------- 2022-05-18T04:56:14.0341907Z Ran 1 test in 2.857s 2022-05-18T04:56:14.0342074Z 2022-05-18T04:56:14.0342186Z OK (skipped=1) 2022-05-18T04:56:14.0342350Z 2022-05-18T04:56:14.0342461Z Generating XML reports... 2022-05-18T04:56:14.0397868Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045611.xml 2022-05-18T04:56:15.1986603Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:56:15.2000959Z 2022-05-18T04:56:15.2001366Z Running tests... 2022-05-18T04:56:15.2001863Z ---------------------------------------------------------------------- 2022-05-18T04:56:16.8179213Z test_invalid_powerSGD_state (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:56:16.8539345Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 57359 2022-05-18T04:56:16.8644541Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 57360 2022-05-18T04:56:17.7794470Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:56:17.7799939Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 0; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-05-18T04:56:17.7801276Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 0; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = False; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-05-18T04:56:17.7802366Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 0; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = False; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-05-18T04:56:17.7803438Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-05-18T04:56:17.7804482Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = False; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-05-18T04:56:17.7805938Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = False; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-05-18T04:56:17.7887463Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:56:17.7895487Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 0; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-05-18T04:56:17.7896787Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 0; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = False; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-05-18T04:56:17.7898088Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 0; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = False; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-05-18T04:56:17.7899156Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-05-18T04:56:17.7900221Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = False; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-05-18T04:56:17.7901283Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = False; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-05-18T04:56:17.9686114Z ok (2.768s) 2022-05-18T04:56:17.9686310Z 2022-05-18T04:56:17.9686682Z ---------------------------------------------------------------------- 2022-05-18T04:56:17.9687040Z Ran 1 test in 2.768s 2022-05-18T04:56:17.9687229Z 2022-05-18T04:56:17.9687324Z OK 2022-05-18T04:56:17.9687461Z 2022-05-18T04:56:17.9687592Z Generating XML reports... 2022-05-18T04:56:17.9731825Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045615.xml 2022-05-18T04:56:19.1518997Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:56:19.1534089Z 2022-05-18T04:56:19.1534559Z Running tests... 2022-05-18T04:56:19.1535055Z ---------------------------------------------------------------------- 2022-05-18T04:56:20.8220034Z test_multiple_outputs_multiple_backward (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:56:20.8586595Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 57468 2022-05-18T04:56:20.8695854Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 57469 2022-05-18T04:56:21.7742645Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:56:21.7744042Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:56:23.1078869Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpl7gjgbl2 2022-05-18T04:56:23.1079939Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpl7gjgbl2/_remote_module_non_scriptable.py 2022-05-18T04:56:23.1207517Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp45ivpw5n 2022-05-18T04:56:23.1209863Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp45ivpw5n/_remote_module_non_scriptable.py 2022-05-18T04:56:24.7800647Z ok (5.626s) 2022-05-18T04:56:24.7800988Z 2022-05-18T04:56:24.7801620Z ---------------------------------------------------------------------- 2022-05-18T04:56:24.7801983Z Ran 1 test in 5.627s 2022-05-18T04:56:24.7802156Z 2022-05-18T04:56:24.7802253Z OK 2022-05-18T04:56:24.7802394Z 2022-05-18T04:56:24.7806645Z Generating XML reports... 2022-05-18T04:56:24.7844609Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045619.xml 2022-05-18T04:56:25.9711278Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:56:25.9726469Z 2022-05-18T04:56:25.9727261Z Running tests... 2022-05-18T04:56:25.9727979Z ---------------------------------------------------------------------- 2022-05-18T04:56:27.6204194Z test_multiple_outputs_multiple_backward_grad_is_view (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:56:27.6555171Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 57595 2022-05-18T04:56:27.6662871Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 57596 2022-05-18T04:56:28.5646423Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:56:28.6049194Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:56:29.8838410Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpp_9t5x07 2022-05-18T04:56:29.8839045Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpp_9t5x07/_remote_module_non_scriptable.py 2022-05-18T04:56:29.8857517Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp716pdexz 2022-05-18T04:56:29.8860487Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp716pdexz/_remote_module_non_scriptable.py 2022-05-18T04:56:31.5782871Z ok (5.605s) 2022-05-18T04:56:31.5783114Z 2022-05-18T04:56:31.5783516Z ---------------------------------------------------------------------- 2022-05-18T04:56:31.5783864Z Ran 1 test in 5.606s 2022-05-18T04:56:31.5784034Z 2022-05-18T04:56:31.5784131Z OK 2022-05-18T04:56:31.5784251Z 2022-05-18T04:56:31.5784395Z Generating XML reports... 2022-05-18T04:56:31.5827610Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045625.xml 2022-05-18T04:56:32.7667138Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:56:32.7682009Z 2022-05-18T04:56:32.7682139Z Running tests... 2022-05-18T04:56:32.7683297Z ---------------------------------------------------------------------- 2022-05-18T04:56:34.4244767Z test_nccl_backend_1gpu_module_device_ids_integer_list (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:56:34.4608659Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 57722 2022-05-18T04:56:34.4719025Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 57723 2022-05-18T04:56:35.3575169Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:56:35.3725328Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:56:36.6889483Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp1bry3c5l 2022-05-18T04:56:36.6891217Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp1bry3c5l/_remote_module_non_scriptable.py 2022-05-18T04:56:36.7166057Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmprgjcci9t 2022-05-18T04:56:36.7168866Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmprgjcci9t/_remote_module_non_scriptable.py 2022-05-18T04:56:38.0618604Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:56:38.0638592Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:56:38.3823206Z ok (5.614s) 2022-05-18T04:56:38.3823569Z 2022-05-18T04:56:38.3824316Z ---------------------------------------------------------------------- 2022-05-18T04:56:38.3824962Z Ran 1 test in 5.614s 2022-05-18T04:56:38.3825139Z 2022-05-18T04:56:38.3825217Z OK 2022-05-18T04:56:38.3825359Z 2022-05-18T04:56:38.3825513Z Generating XML reports... 2022-05-18T04:56:38.3868143Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045632.xml 2022-05-18T04:56:39.5612669Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:56:39.5627118Z 2022-05-18T04:56:39.5627617Z Running tests... 2022-05-18T04:56:39.5628131Z ---------------------------------------------------------------------- 2022-05-18T04:56:41.1697368Z test_nccl_backend_1gpu_module_device_ids_torch_device_list (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:56:41.2048200Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 57849 2022-05-18T04:56:41.2156200Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 57850 2022-05-18T04:56:42.1073917Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:56:42.1161751Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:56:43.4185469Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpn0yxw1il 2022-05-18T04:56:43.4186644Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpn0yxw1il/_remote_module_non_scriptable.py 2022-05-18T04:56:43.4724331Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp4uscbezj 2022-05-18T04:56:43.4725491Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp4uscbezj/_remote_module_non_scriptable.py 2022-05-18T04:56:44.7411735Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:56:44.7412792Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:56:45.1263103Z ok (5.563s) 2022-05-18T04:56:45.1263343Z 2022-05-18T04:56:45.1263760Z ---------------------------------------------------------------------- 2022-05-18T04:56:45.1264106Z Ran 1 test in 5.564s 2022-05-18T04:56:45.1264274Z 2022-05-18T04:56:45.1264384Z OK 2022-05-18T04:56:45.1264522Z 2022-05-18T04:56:45.1264663Z Generating XML reports... 2022-05-18T04:56:45.1307875Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045639.xml 2022-05-18T04:56:46.3068861Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:56:46.3083343Z 2022-05-18T04:56:46.3083500Z Running tests... 2022-05-18T04:56:46.3084357Z ---------------------------------------------------------------------- 2022-05-18T04:56:47.9143352Z test_nccl_backend_2gpu_module (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:56:47.9493049Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 57976 2022-05-18T04:56:47.9602053Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 57977 2022-05-18T04:56:48.7996886Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:56:48.8358222Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:56:48.9643889Z skip: Need at least 4 CUDA devices (2.656s) 2022-05-18T04:56:48.9644133Z 2022-05-18T04:56:48.9644510Z ---------------------------------------------------------------------- 2022-05-18T04:56:48.9645065Z Ran 1 test in 2.656s 2022-05-18T04:56:48.9645256Z 2022-05-18T04:56:48.9645371Z OK (skipped=1) 2022-05-18T04:56:48.9645538Z 2022-05-18T04:56:48.9645666Z Generating XML reports... 2022-05-18T04:56:48.9700723Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045646.xml 2022-05-18T04:56:50.1321688Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:56:50.1337010Z 2022-05-18T04:56:50.1337278Z Running tests... 2022-05-18T04:56:50.1337735Z ---------------------------------------------------------------------- 2022-05-18T04:56:51.7801949Z test_nccl_backend_4gpu_module (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:56:51.8159923Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 58085 2022-05-18T04:56:51.8268510Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 58086 2022-05-18T04:56:52.7124233Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:56:52.7192765Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:56:52.9310244Z skip: Need at least 8 CUDA devices (2.797s) 2022-05-18T04:56:52.9310508Z 2022-05-18T04:56:52.9310912Z ---------------------------------------------------------------------- 2022-05-18T04:56:52.9311242Z Ran 1 test in 2.797s 2022-05-18T04:56:52.9311412Z 2022-05-18T04:56:52.9311527Z OK (skipped=1) 2022-05-18T04:56:52.9311686Z 2022-05-18T04:56:52.9311834Z Generating XML reports... 2022-05-18T04:56:52.9368722Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045650.xml 2022-05-18T04:56:54.1153930Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:56:54.1168806Z 2022-05-18T04:56:54.1169185Z Running tests... 2022-05-18T04:56:54.1169706Z ---------------------------------------------------------------------- 2022-05-18T04:56:55.7431274Z test_nccl_backend_multi_device_ids_not_allowed (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:56:55.7784364Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 58194 2022-05-18T04:56:55.7893318Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 58195 2022-05-18T04:56:56.6915412Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:56:56.6929061Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:56:58.2965447Z ok (4.179s) 2022-05-18T04:56:58.2965783Z 2022-05-18T04:56:58.2966314Z ---------------------------------------------------------------------- 2022-05-18T04:56:58.2966672Z Ran 1 test in 4.180s 2022-05-18T04:56:58.2966840Z 2022-05-18T04:56:58.2966937Z OK 2022-05-18T04:56:58.2967073Z 2022-05-18T04:56:58.2967191Z Generating XML reports... 2022-05-18T04:56:58.3009498Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045654.xml 2022-05-18T04:56:59.4953985Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:56:59.4968535Z 2022-05-18T04:56:59.4968821Z Running tests... 2022-05-18T04:56:59.4969272Z ---------------------------------------------------------------------- 2022-05-18T04:57:01.1559863Z test_nccl_backend_multi_device_module_device_ids_None (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:57:01.1919940Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 58309 2022-05-18T04:57:01.2030549Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 58310 2022-05-18T04:57:02.1305124Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:57:02.1564598Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:57:02.3074733Z skip: Need at least 4 CUDA devices (2.810s) 2022-05-18T04:57:02.3075000Z 2022-05-18T04:57:02.3075570Z ---------------------------------------------------------------------- 2022-05-18T04:57:02.3075901Z Ran 1 test in 2.811s 2022-05-18T04:57:02.3076069Z 2022-05-18T04:57:02.3076185Z OK (skipped=1) 2022-05-18T04:57:02.3076345Z 2022-05-18T04:57:02.3077046Z Generating XML reports... 2022-05-18T04:57:02.3131916Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045659.xml 2022-05-18T04:57:03.5034174Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:57:03.5049380Z 2022-05-18T04:57:03.5049654Z Running tests... 2022-05-18T04:57:03.5050108Z ---------------------------------------------------------------------- 2022-05-18T04:57:05.1702986Z test_nccl_backend_single_device_module_device_ids_None (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:57:05.2068835Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 58418 2022-05-18T04:57:05.2179515Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 58419 2022-05-18T04:57:06.1512487Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:57:06.1745261Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:57:07.4773833Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpyka_t1vu 2022-05-18T04:57:07.4774935Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpyka_t1vu/_remote_module_non_scriptable.py 2022-05-18T04:57:07.4965897Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpcluw7860 2022-05-18T04:57:07.4968276Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpcluw7860/_remote_module_non_scriptable.py 2022-05-18T04:57:08.8075885Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:57:08.8076636Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:57:09.1285237Z ok (5.623s) 2022-05-18T04:57:09.1285463Z 2022-05-18T04:57:09.1285847Z ---------------------------------------------------------------------- 2022-05-18T04:57:09.1286198Z Ran 1 test in 5.623s 2022-05-18T04:57:09.1286369Z 2022-05-18T04:57:09.1286475Z OK 2022-05-18T04:57:09.1286612Z 2022-05-18T04:57:09.1286747Z Generating XML reports... 2022-05-18T04:57:09.1332434Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045703.xml 2022-05-18T04:57:10.3349927Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:57:10.3364997Z 2022-05-18T04:57:10.3365462Z Running tests... 2022-05-18T04:57:10.3365997Z ---------------------------------------------------------------------- 2022-05-18T04:57:11.9909533Z test_nccl_backend_single_device_module_empty_device_ids (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:57:12.0276562Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 58545 2022-05-18T04:57:12.0387150Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 58546 2022-05-18T04:57:12.9438348Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:57:12.9701242Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:57:14.2765344Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpofn87nux 2022-05-18T04:57:14.2766429Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpofn87nux/_remote_module_non_scriptable.py 2022-05-18T04:57:14.2930168Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpxn6yncix 2022-05-18T04:57:14.2933407Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpxn6yncix/_remote_module_non_scriptable.py 2022-05-18T04:57:15.6381845Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:57:15.6382397Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:57:15.9490278Z ok (5.612s) 2022-05-18T04:57:15.9490490Z 2022-05-18T04:57:15.9491269Z ---------------------------------------------------------------------- 2022-05-18T04:57:15.9491603Z Ran 1 test in 5.613s 2022-05-18T04:57:15.9491792Z 2022-05-18T04:57:15.9491896Z OK 2022-05-18T04:57:15.9492035Z 2022-05-18T04:57:15.9492172Z Generating XML reports... 2022-05-18T04:57:15.9534781Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045710.xml 2022-05-18T04:57:17.1421044Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:57:17.1435935Z 2022-05-18T04:57:17.1436272Z Running tests... 2022-05-18T04:57:17.1436741Z ---------------------------------------------------------------------- 2022-05-18T04:57:18.8087184Z test_nccl_propagate_error_reason (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:57:18.8449007Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 58672 2022-05-18T04:57:18.8559801Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 58673 2022-05-18T04:57:19.7579373Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:57:19.7790555Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:57:38.1004354Z ok (20.956s) 2022-05-18T04:57:38.1004576Z 2022-05-18T04:57:38.1007606Z ---------------------------------------------------------------------- 2022-05-18T04:57:38.1008014Z Ran 1 test in 20.957s 2022-05-18T04:57:38.1008188Z 2022-05-18T04:57:38.1008284Z OK 2022-05-18T04:57:38.1008426Z 2022-05-18T04:57:38.1008574Z Generating XML reports... 2022-05-18T04:57:38.1049223Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045717.xml 2022-05-18T04:57:39.2709558Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:57:39.2724015Z 2022-05-18T04:57:39.2724514Z Running tests... 2022-05-18T04:57:39.2725017Z ---------------------------------------------------------------------- 2022-05-18T04:57:39.2745796Z test_no_grad (__main__.DistributedDataParallelTest) 2022-05-18T04:57:40.8713180Z Note: this test can be sped up by only running it on a CPU module ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:57:40.9064808Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 58799 2022-05-18T04:57:40.9174926Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 58800 2022-05-18T04:57:41.8171429Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:57:41.8525825Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:57:43.1413216Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmperd3fwp5 2022-05-18T04:57:43.1414419Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmperd3fwp5/_remote_module_non_scriptable.py 2022-05-18T04:57:43.1470713Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpcnp50fgb 2022-05-18T04:57:43.1472955Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpcnp50fgb/_remote_module_non_scriptable.py 2022-05-18T04:57:44.8278984Z ok (5.555s) 2022-05-18T04:57:44.8279250Z 2022-05-18T04:57:44.8279673Z ---------------------------------------------------------------------- 2022-05-18T04:57:44.8280026Z Ran 1 test in 5.555s 2022-05-18T04:57:44.8280197Z 2022-05-18T04:57:44.8280295Z OK 2022-05-18T04:57:44.8281884Z 2022-05-18T04:57:44.8282498Z Generating XML reports... 2022-05-18T04:57:44.8325361Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045739.xml 2022-05-18T04:57:46.0168798Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:57:46.0184606Z 2022-05-18T04:57:46.0185094Z Running tests... 2022-05-18T04:57:46.0185619Z ---------------------------------------------------------------------- 2022-05-18T04:57:47.6683475Z test_param_layout_mismatch_error (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:57:47.7043786Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 58922 2022-05-18T04:57:47.7153911Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 58923 2022-05-18T04:57:48.6104364Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:57:48.6208986Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:57:49.9198505Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp2b7fosa_ 2022-05-18T04:57:49.9199127Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp2b7fosa_/_remote_module_non_scriptable.py 2022-05-18T04:57:49.9263454Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpgk0i605a 2022-05-18T04:57:49.9266098Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpgk0i605a/_remote_module_non_scriptable.py 2022-05-18T04:57:51.3252607Z ok (5.306s) 2022-05-18T04:57:51.3252837Z 2022-05-18T04:57:51.3253262Z ---------------------------------------------------------------------- 2022-05-18T04:57:51.3253617Z Ran 1 test in 5.307s 2022-05-18T04:57:51.3253793Z 2022-05-18T04:57:51.3253881Z OK 2022-05-18T04:57:51.3254022Z 2022-05-18T04:57:51.3254158Z Generating XML reports... 2022-05-18T04:57:51.3302037Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045746.xml 2022-05-18T04:57:52.5223953Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:57:52.5240216Z 2022-05-18T04:57:52.5240670Z Running tests... 2022-05-18T04:57:52.5241184Z ---------------------------------------------------------------------- 2022-05-18T04:57:54.1793700Z test_pass_default_pg (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:57:54.2157066Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 59045 2022-05-18T04:57:54.2266805Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 59046 2022-05-18T04:57:55.1255321Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:57:55.1259979Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:57:55.1690478Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:57:55.1696007Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:57:55.1697208Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:57:55.1770151Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:57:55.3309343Z ok (2.807s) 2022-05-18T04:57:55.3309730Z 2022-05-18T04:57:55.3310757Z ---------------------------------------------------------------------- 2022-05-18T04:57:55.3311397Z Ran 1 test in 2.807s 2022-05-18T04:57:55.3311691Z 2022-05-18T04:57:55.3311863Z OK 2022-05-18T04:57:55.3312103Z 2022-05-18T04:57:55.3312312Z Generating XML reports... 2022-05-18T04:57:55.3355880Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045752.xml 2022-05-18T04:57:56.5078638Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:57:56.5093865Z 2022-05-18T04:57:56.5094254Z Running tests... 2022-05-18T04:57:56.5094748Z ---------------------------------------------------------------------- 2022-05-18T04:57:58.1556122Z test_powerSGD_ddp_comm_hook_nccl (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:57:58.1916203Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 59158 2022-05-18T04:57:58.2026207Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 59159 2022-05-18T04:57:59.0970006Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:57:59.0971364Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = True 2022-05-18T04:57:59.1029910Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:57:59.1032732Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = True 2022-05-18T04:58:00.4239546Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpibt8bxx9 2022-05-18T04:58:00.4240363Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpibt8bxx9/_remote_module_non_scriptable.py 2022-05-18T04:58:00.4385896Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp4b10suv2 2022-05-18T04:58:00.4388803Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp4b10suv2/_remote_module_non_scriptable.py 2022-05-18T04:58:01.5025669Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-05-18T04:58:01.5026813Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-05-18T04:58:01.5077554Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = False; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = True 2022-05-18T04:58:01.5078693Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = False; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = True 2022-05-18T04:58:01.5130292Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = False; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-05-18T04:58:01.5132355Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = False; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-05-18T04:58:01.5183448Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = False; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = True 2022-05-18T04:58:01.5184539Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = False; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = True 2022-05-18T04:58:01.5236223Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = False; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-05-18T04:58:01.5237300Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = False; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-05-18T04:58:01.5290086Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = False; warm_start = False; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = True 2022-05-18T04:58:01.5291900Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = False; warm_start = False; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = True 2022-05-18T04:58:01.5343220Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = False; warm_start = False; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-05-18T04:58:01.5344320Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = False; warm_start = False; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-05-18T04:58:01.9128450Z ok (5.403s) 2022-05-18T04:58:01.9128680Z 2022-05-18T04:58:01.9129452Z ---------------------------------------------------------------------- 2022-05-18T04:58:01.9129844Z Ran 1 test in 5.403s 2022-05-18T04:58:01.9130019Z 2022-05-18T04:58:01.9130099Z OK 2022-05-18T04:58:01.9130493Z 2022-05-18T04:58:01.9130643Z Generating XML reports... 2022-05-18T04:58:01.9174372Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045756.xml 2022-05-18T04:58:03.0741403Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:58:03.0756130Z 2022-05-18T04:58:03.0756328Z Running tests... 2022-05-18T04:58:03.0756782Z ---------------------------------------------------------------------- 2022-05-18T04:58:04.7129893Z test_powerSGD_ddp_comm_hook_nccl_grad_is_view (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:58:04.7488151Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 59285 2022-05-18T04:58:04.7596628Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 59286 2022-05-18T04:58:05.6863812Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:58:05.6864925Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = True 2022-05-18T04:58:05.7140842Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:58:05.7142502Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = True 2022-05-18T04:58:06.9947395Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpiwyi_dpr 2022-05-18T04:58:06.9948026Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpiwyi_dpr/_remote_module_non_scriptable.py 2022-05-18T04:58:07.0020796Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp_egpyn7h 2022-05-18T04:58:07.0023519Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp_egpyn7h/_remote_module_non_scriptable.py 2022-05-18T04:58:08.0593502Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-05-18T04:58:08.0594651Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-05-18T04:58:08.0644930Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = False; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = True 2022-05-18T04:58:08.0646036Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = False; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = True 2022-05-18T04:58:08.0697460Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = False; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-05-18T04:58:08.0698848Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = False; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-05-18T04:58:08.0749880Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = False; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = True 2022-05-18T04:58:08.0750975Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = False; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = True 2022-05-18T04:58:08.0802052Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = False; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-05-18T04:58:08.0803508Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = False; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-05-18T04:58:08.0855518Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = False; warm_start = False; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = True 2022-05-18T04:58:08.0856806Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = False; warm_start = False; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = True 2022-05-18T04:58:08.0908402Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = False; warm_start = False; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-05-18T04:58:08.0909532Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1000; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = False; warm_start = False; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-05-18T04:58:08.4697120Z ok (5.394s) 2022-05-18T04:58:08.4697320Z 2022-05-18T04:58:08.4697966Z ---------------------------------------------------------------------- 2022-05-18T04:58:08.4698319Z Ran 1 test in 5.394s 2022-05-18T04:58:08.4698488Z 2022-05-18T04:58:08.4698588Z OK 2022-05-18T04:58:08.4698713Z 2022-05-18T04:58:08.4698850Z Generating XML reports... 2022-05-18T04:58:08.4742407Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045803.xml 2022-05-18T04:58:09.6579307Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:58:09.6594702Z 2022-05-18T04:58:09.6594972Z Running tests... 2022-05-18T04:58:09.6595724Z ---------------------------------------------------------------------- 2022-05-18T04:58:11.3174283Z test_sync_batch_norm_empty_input (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:58:11.3537460Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 59412 2022-05-18T04:58:11.3646456Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 59413 2022-05-18T04:58:12.3028365Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:58:12.3333949Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:58:13.5904964Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpgxeob_s9 2022-05-18T04:58:13.5905589Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpgxeob_s9/_remote_module_non_scriptable.py 2022-05-18T04:58:13.6602895Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpxk1w32ew 2022-05-18T04:58:13.6604306Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpxk1w32ew/_remote_module_non_scriptable.py 2022-05-18T04:58:15.8832962Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:58:15.8833516Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:58:16.3774890Z ok (6.718s) 2022-05-18T04:58:16.3775112Z 2022-05-18T04:58:16.3775510Z ---------------------------------------------------------------------- 2022-05-18T04:58:16.3775873Z Ran 1 test in 6.718s 2022-05-18T04:58:16.3776043Z 2022-05-18T04:58:16.3776121Z OK 2022-05-18T04:58:16.3776261Z 2022-05-18T04:58:16.3776399Z Generating XML reports... 2022-05-18T04:58:16.3819549Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045809.xml 2022-05-18T04:58:17.5572661Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:58:17.5587936Z 2022-05-18T04:58:17.5588249Z Running tests... 2022-05-18T04:58:17.5588692Z ---------------------------------------------------------------------- 2022-05-18T04:58:19.1832310Z test_sync_batch_norm_only_empty_input (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:58:19.2190547Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 59539 2022-05-18T04:58:19.2298335Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 59540 2022-05-18T04:58:20.1658132Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:58:20.1693478Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:58:21.5008131Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp7j49_877 2022-05-18T04:58:21.5008751Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp7j49_877/_remote_module_non_scriptable.py 2022-05-18T04:58:21.5061037Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpk10umzez 2022-05-18T04:58:21.5064091Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpk10umzez/_remote_module_non_scriptable.py 2022-05-18T04:58:23.0830662Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:58:23.0831242Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T04:58:23.4409027Z ok (5.882s) 2022-05-18T04:58:23.4409255Z 2022-05-18T04:58:23.4409653Z ---------------------------------------------------------------------- 2022-05-18T04:58:23.4409987Z Ran 1 test in 5.882s 2022-05-18T04:58:23.4410156Z 2022-05-18T04:58:23.4410444Z OK 2022-05-18T04:58:23.4413136Z 2022-05-18T04:58:23.4413484Z Generating XML reports... 2022-05-18T04:58:23.4454232Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045817.xml 2022-05-18T04:58:24.6195274Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:58:24.6209643Z 2022-05-18T04:58:24.6210202Z Running tests... 2022-05-18T04:58:24.6210705Z ---------------------------------------------------------------------- 2022-05-18T04:58:26.2169293Z test_invalid_nccl_blocking_wait_env (__main__.NcclErrorHandlingTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:58:26.2519142Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 59666 2022-05-18T04:58:26.2626041Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 59667 2022-05-18T04:58:26.2738496Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 59668 2022-05-18T04:58:27.1621715Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:58:27.1748330Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:58:27.1797368Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:58:27.3782665Z skip: Need at least 3 CUDA devices (2.757s) 2022-05-18T04:58:27.3782925Z 2022-05-18T04:58:27.3783314Z ---------------------------------------------------------------------- 2022-05-18T04:58:27.3783664Z Ran 1 test in 2.757s 2022-05-18T04:58:27.3783836Z 2022-05-18T04:58:27.3783948Z OK (skipped=1) 2022-05-18T04:58:27.3784115Z 2022-05-18T04:58:27.3784250Z Generating XML reports... 2022-05-18T04:58:27.3839837Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-NcclErrorHandlingTest-20220518045824.xml 2022-05-18T04:58:28.5538754Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:58:28.5553783Z 2022-05-18T04:58:28.5554093Z Running tests... 2022-05-18T04:58:28.5554518Z ---------------------------------------------------------------------- 2022-05-18T04:58:30.1971033Z test_nccl_blocking_wait_with_barrier (__main__.NcclErrorHandlingTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:58:30.2332332Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 59811 2022-05-18T04:58:30.2440790Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 59812 2022-05-18T04:58:30.2549925Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 59813 2022-05-18T04:58:31.1732946Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:58:31.1813669Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:58:31.2352129Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:58:31.4598855Z skip: Need at least 3 CUDA devices (2.904s) 2022-05-18T04:58:31.4599094Z 2022-05-18T04:58:31.4599482Z ---------------------------------------------------------------------- 2022-05-18T04:58:31.4599822Z Ran 1 test in 2.904s 2022-05-18T04:58:31.4599992Z 2022-05-18T04:58:31.4600113Z OK (skipped=1) 2022-05-18T04:58:31.4600610Z 2022-05-18T04:58:31.4600750Z Generating XML reports... 2022-05-18T04:58:31.4660426Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-NcclErrorHandlingTest-20220518045828.xml 2022-05-18T04:58:32.6434668Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:58:32.6449664Z 2022-05-18T04:58:32.6449808Z Running tests... 2022-05-18T04:58:32.6450612Z ---------------------------------------------------------------------- 2022-05-18T04:58:32.6457456Z test_nccl_errors_blocking_abort (__main__.NcclErrorHandlingTest) ... skip: Frequently times out see https://github.com/pytorch/pytorch/issues/58920 (0.001s) 2022-05-18T04:58:32.6458016Z 2022-05-18T04:58:32.6458320Z ---------------------------------------------------------------------- 2022-05-18T04:58:32.6458663Z Ran 1 test in 0.001s 2022-05-18T04:58:32.6459150Z 2022-05-18T04:58:32.6459263Z OK (skipped=1) 2022-05-18T04:58:32.6459402Z 2022-05-18T04:58:32.6459530Z Generating XML reports... 2022-05-18T04:58:32.6493389Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-NcclErrorHandlingTest-20220518045832.xml 2022-05-18T04:58:33.6719589Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:58:33.6735308Z 2022-05-18T04:58:33.6735817Z Running tests... 2022-05-18T04:58:33.6736286Z ---------------------------------------------------------------------- 2022-05-18T04:58:35.3313396Z test_nccl_errors_blocking_clean_exit (__main__.NcclErrorHandlingTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:58:35.3673359Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 59991 2022-05-18T04:58:35.3783045Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 59992 2022-05-18T04:58:35.3894094Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 59993 2022-05-18T04:58:36.2859971Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:58:36.2864939Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:58:36.2897694Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:58:36.4938781Z skip: Need at least 3 CUDA devices (2.820s) 2022-05-18T04:58:36.4939015Z 2022-05-18T04:58:36.4939592Z ---------------------------------------------------------------------- 2022-05-18T04:58:36.4939943Z Ran 1 test in 2.820s 2022-05-18T04:58:36.4940112Z 2022-05-18T04:58:36.4940213Z OK (skipped=1) 2022-05-18T04:58:36.4940377Z 2022-05-18T04:58:36.4940514Z Generating XML reports... 2022-05-18T04:58:36.4997648Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-NcclErrorHandlingTest-20220518045833.xml 2022-05-18T04:58:37.6723655Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:58:37.6738789Z 2022-05-18T04:58:37.6739014Z Running tests... 2022-05-18T04:58:37.6739457Z ---------------------------------------------------------------------- 2022-05-18T04:58:39.3261865Z test_nccl_errors_blocking_nonzero_exit (__main__.NcclErrorHandlingTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:58:39.3627279Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 60136 2022-05-18T04:58:39.3739564Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 60137 2022-05-18T04:58:39.3848728Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 60138 2022-05-18T04:58:40.2833153Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:58:40.3007715Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:58:40.3118162Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:58:40.4895894Z ok (2.815s) 2022-05-18T04:58:40.4896083Z 2022-05-18T04:58:40.4896582Z ---------------------------------------------------------------------- 2022-05-18T04:58:40.4896907Z Ran 1 test in 2.816s 2022-05-18T04:58:40.4897080Z 2022-05-18T04:58:40.4897187Z OK 2022-05-18T04:58:40.4897324Z 2022-05-18T04:58:40.4897456Z Generating XML reports... 2022-05-18T04:58:40.4952196Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-NcclErrorHandlingTest-20220518045837.xml 2022-05-18T04:58:41.6704371Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:58:41.6719978Z 2022-05-18T04:58:41.6720642Z Running tests... 2022-05-18T04:58:41.6721204Z ---------------------------------------------------------------------- 2022-05-18T04:58:43.3381774Z test_nccl_errors_blocking_sigkill (__main__.NcclErrorHandlingTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:58:43.3743259Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 60281 2022-05-18T04:58:43.3851837Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 60282 2022-05-18T04:58:43.3963891Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 60283 2022-05-18T04:58:44.2987589Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:58:44.3299278Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:58:44.3462324Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:58:44.5013562Z ok (2.829s) 2022-05-18T04:58:44.5013785Z 2022-05-18T04:58:44.5014170Z ---------------------------------------------------------------------- 2022-05-18T04:58:44.5014523Z Ran 1 test in 2.829s 2022-05-18T04:58:44.5014694Z 2022-05-18T04:58:44.5014794Z OK 2022-05-18T04:58:44.5014933Z 2022-05-18T04:58:44.5015075Z Generating XML reports... 2022-05-18T04:58:44.5071680Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-NcclErrorHandlingTest-20220518045841.xml 2022-05-18T04:58:45.6769220Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:58:45.6784732Z 2022-05-18T04:58:45.6785221Z Running tests... 2022-05-18T04:58:45.6785744Z ---------------------------------------------------------------------- 2022-05-18T04:58:47.3212009Z test_nccl_errors_blocking_sigterm (__main__.NcclErrorHandlingTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:58:47.3565598Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 60426 2022-05-18T04:58:47.3674095Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 60427 2022-05-18T04:58:47.3783600Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 60428 2022-05-18T04:58:48.2619924Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:58:48.3377951Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:58:48.3496465Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:58:48.5836982Z ok (2.905s) 2022-05-18T04:58:48.5837204Z 2022-05-18T04:58:48.5837622Z ---------------------------------------------------------------------- 2022-05-18T04:58:48.5837959Z Ran 1 test in 2.905s 2022-05-18T04:58:48.5838162Z 2022-05-18T04:58:48.5838265Z OK 2022-05-18T04:58:48.5838410Z 2022-05-18T04:58:48.5838549Z Generating XML reports... 2022-05-18T04:58:48.5896956Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-NcclErrorHandlingTest-20220518045845.xml 2022-05-18T04:58:49.7540589Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:58:49.7555303Z 2022-05-18T04:58:49.7555570Z Running tests... 2022-05-18T04:58:49.7556028Z ---------------------------------------------------------------------- 2022-05-18T04:58:49.7572723Z test_nccl_errors_nonblocking (__main__.NcclErrorHandlingTest) ... skip: Test does not pass when run locally (0.002s) 2022-05-18T04:58:49.7573104Z 2022-05-18T04:58:49.7573396Z ---------------------------------------------------------------------- 2022-05-18T04:58:49.7573732Z Ran 1 test in 0.002s 2022-05-18T04:58:49.7573900Z 2022-05-18T04:58:49.7574013Z OK (skipped=1) 2022-05-18T04:58:49.7574172Z 2022-05-18T04:58:49.7574309Z Generating XML reports... 2022-05-18T04:58:49.7608208Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-NcclErrorHandlingTest-20220518045849.xml 2022-05-18T04:58:50.7807659Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:58:50.7822897Z 2022-05-18T04:58:50.7823433Z Running tests... 2022-05-18T04:58:50.7823936Z ---------------------------------------------------------------------- 2022-05-18T04:58:52.4280447Z test_nccl_timeout (__main__.NcclErrorHandlingTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:58:52.4640720Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 60606 2022-05-18T04:58:52.4750193Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 60607 2022-05-18T04:58:52.4861887Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 60608 2022-05-18T04:58:53.3939132Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T04:58:53.3943316Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:58:53.3971835Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:58:53.5907936Z skip: Need at least 3 CUDA devices (2.808s) 2022-05-18T04:58:53.5908180Z 2022-05-18T04:58:53.5908581Z ---------------------------------------------------------------------- 2022-05-18T04:58:53.5908931Z Ran 1 test in 2.808s 2022-05-18T04:58:53.5909101Z 2022-05-18T04:58:53.5909236Z OK (skipped=1) 2022-05-18T04:58:53.5909397Z 2022-05-18T04:58:53.5909507Z Generating XML reports... 2022-05-18T04:58:53.5967843Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-NcclErrorHandlingTest-20220518045850.xml 2022-05-18T04:58:54.7788179Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:58:54.7803242Z 2022-05-18T04:58:54.7803563Z Running tests... 2022-05-18T04:58:54.7804018Z ---------------------------------------------------------------------- 2022-05-18T04:58:54.7811409Z test_init_no_gpus (__main__.ProcessGroupNCCLNoGPUTest) ... skip: GPUs are available, skipping test (0.001s) 2022-05-18T04:58:54.7811763Z 2022-05-18T04:58:54.7812097Z ---------------------------------------------------------------------- 2022-05-18T04:58:54.7812425Z Ran 1 test in 0.001s 2022-05-18T04:58:54.7812596Z 2022-05-18T04:58:54.7812712Z OK (skipped=1) 2022-05-18T04:58:54.7812871Z 2022-05-18T04:58:54.7813001Z Generating XML reports... 2022-05-18T04:58:54.7847553Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLNoGPUTest-20220518045854.xml 2022-05-18T04:58:55.7943190Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:58:55.7958454Z 2022-05-18T04:58:55.7958897Z Running tests... 2022-05-18T04:58:55.7959433Z ---------------------------------------------------------------------- 2022-05-18T04:58:57.4175969Z test_allgather_base_basics (__main__.ProcessGroupNCCLTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:58:57.4528954Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 60786 2022-05-18T04:58:57.4636732Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 60787 2022-05-18T04:58:58.3710051Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:58:58.3712204Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:58:58.3733094Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:58:58.3736854Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:58:58.3737707Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:58:58.3815859Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:58:59.9710462Z ok (4.175s) 2022-05-18T04:58:59.9710874Z 2022-05-18T04:58:59.9711285Z ---------------------------------------------------------------------- 2022-05-18T04:58:59.9711654Z Ran 1 test in 4.175s 2022-05-18T04:58:59.9711820Z 2022-05-18T04:58:59.9711898Z OK 2022-05-18T04:58:59.9712042Z 2022-05-18T04:58:59.9712177Z Generating XML reports... 2022-05-18T04:58:59.9754529Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLTest-20220518045855.xml 2022-05-18T04:59:01.1487113Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:59:01.1501593Z 2022-05-18T04:59:01.1502073Z Running tests... 2022-05-18T04:59:01.1502570Z ---------------------------------------------------------------------- 2022-05-18T04:59:02.7599240Z test_allgather_base_ops (__main__.ProcessGroupNCCLTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:59:02.7948931Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 60901 2022-05-18T04:59:02.8058275Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 60902 2022-05-18T04:59:03.7406378Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:59:03.7409013Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:59:03.7605434Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:59:03.7610035Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:59:03.7610854Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:59:03.7614036Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:59:06.4169514Z ok (5.266s) 2022-05-18T04:59:06.4169747Z 2022-05-18T04:59:06.4170156Z ---------------------------------------------------------------------- 2022-05-18T04:59:06.4170712Z Ran 1 test in 5.267s 2022-05-18T04:59:06.4170885Z 2022-05-18T04:59:06.4170982Z OK 2022-05-18T04:59:06.4171119Z 2022-05-18T04:59:06.4171907Z Generating XML reports... 2022-05-18T04:59:06.4214614Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLTest-20220518045901.xml 2022-05-18T04:59:07.6057490Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:59:07.6072629Z 2022-05-18T04:59:07.6073109Z Running tests... 2022-05-18T04:59:07.6073565Z ---------------------------------------------------------------------- 2022-05-18T04:59:09.2530423Z test_allgather_ops (__main__.ProcessGroupNCCLTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:59:09.2882625Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 61024 2022-05-18T04:59:09.2991246Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 61025 2022-05-18T04:59:10.2228264Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:59:10.2231498Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:59:10.2574824Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:59:10.2578423Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:59:10.2579563Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:59:10.2639692Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:59:12.9091209Z ok (5.301s) 2022-05-18T04:59:12.9094923Z 2022-05-18T04:59:12.9095365Z ---------------------------------------------------------------------- 2022-05-18T04:59:12.9095731Z Ran 1 test in 5.302s 2022-05-18T04:59:12.9095909Z 2022-05-18T04:59:12.9096007Z OK 2022-05-18T04:59:12.9096128Z 2022-05-18T04:59:12.9096267Z Generating XML reports... 2022-05-18T04:59:12.9139526Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLTest-20220518045907.xml 2022-05-18T04:59:14.0807309Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:59:14.0822223Z 2022-05-18T04:59:14.0822623Z Running tests... 2022-05-18T04:59:14.0823132Z ---------------------------------------------------------------------- 2022-05-18T04:59:15.6958156Z test_allreduce_ops (__main__.ProcessGroupNCCLTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:59:15.7311366Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 61147 2022-05-18T04:59:15.7420286Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 61148 2022-05-18T04:59:16.6447051Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:59:16.6449581Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:59:16.6466271Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:59:16.6469739Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:59:16.6470889Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:59:16.6552816Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:59:19.3517464Z ok (5.269s) 2022-05-18T04:59:19.3517738Z 2022-05-18T04:59:19.3518155Z ---------------------------------------------------------------------- 2022-05-18T04:59:19.3518511Z Ran 1 test in 5.269s 2022-05-18T04:59:19.3518679Z 2022-05-18T04:59:19.3518779Z OK 2022-05-18T04:59:19.3518925Z 2022-05-18T04:59:19.3519053Z Generating XML reports... 2022-05-18T04:59:19.3568314Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLTest-20220518045914.xml 2022-05-18T04:59:20.5448099Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:59:20.5463357Z 2022-05-18T04:59:20.5463643Z Running tests... 2022-05-18T04:59:20.5464338Z ---------------------------------------------------------------------- 2022-05-18T04:59:22.2116526Z test_barrier (__main__.ProcessGroupNCCLTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:59:22.2478352Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 61270 2022-05-18T04:59:22.2587552Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 61271 2022-05-18T04:59:23.1901102Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:59:23.1903600Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:59:23.2200068Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:59:23.2204082Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:59:23.2205092Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:59:23.2210040Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:59:25.8689238Z ok (5.322s) 2022-05-18T04:59:25.8689570Z 2022-05-18T04:59:25.8690091Z ---------------------------------------------------------------------- 2022-05-18T04:59:25.8690633Z Ran 1 test in 5.323s 2022-05-18T04:59:25.8690821Z 2022-05-18T04:59:25.8690919Z OK 2022-05-18T04:59:25.8691057Z 2022-05-18T04:59:25.8691194Z Generating XML reports... 2022-05-18T04:59:25.8734658Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLTest-20220518045920.xml 2022-05-18T04:59:27.0524793Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:59:27.0540366Z 2022-05-18T04:59:27.0540689Z Running tests... 2022-05-18T04:59:27.0541149Z ---------------------------------------------------------------------- 2022-05-18T04:59:28.6958668Z test_broadcast_ops (__main__.ProcessGroupNCCLTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:59:28.7311150Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 61393 2022-05-18T04:59:28.7419140Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 61394 2022-05-18T04:59:29.6383542Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:59:29.6385156Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:59:29.6771929Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:59:29.6775947Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:59:29.6776738Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:59:29.6793409Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:59:32.3528892Z ok (5.298s) 2022-05-18T04:59:32.3529175Z 2022-05-18T04:59:32.3529566Z ---------------------------------------------------------------------- 2022-05-18T04:59:32.3530025Z Ran 1 test in 5.299s 2022-05-18T04:59:32.3530486Z 2022-05-18T04:59:32.3530856Z OK 2022-05-18T04:59:32.3531051Z 2022-05-18T04:59:32.3531276Z Generating XML reports... 2022-05-18T04:59:32.3574391Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLTest-20220518045927.xml 2022-05-18T04:59:33.5485080Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:59:33.5500448Z 2022-05-18T04:59:33.5500945Z Running tests... 2022-05-18T04:59:33.5501620Z ---------------------------------------------------------------------- 2022-05-18T04:59:35.1987177Z test_empty_tensors (__main__.ProcessGroupNCCLTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:59:35.2349812Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 61516 2022-05-18T04:59:35.2459990Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 61517 2022-05-18T04:59:36.0946971Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:59:36.0949440Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:59:36.1481335Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:59:36.1485055Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:59:36.1485880Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:59:36.1561101Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:59:38.8571996Z ok (5.307s) 2022-05-18T04:59:38.8572299Z 2022-05-18T04:59:38.8572679Z ---------------------------------------------------------------------- 2022-05-18T04:59:38.8573042Z Ran 1 test in 5.307s 2022-05-18T04:59:38.8573214Z 2022-05-18T04:59:38.8573314Z OK 2022-05-18T04:59:38.8573454Z 2022-05-18T04:59:38.8573685Z Generating XML reports... 2022-05-18T04:59:38.8618750Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLTest-20220518045933.xml 2022-05-18T04:59:40.0421571Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:59:40.0436815Z 2022-05-18T04:59:40.0437060Z Running tests... 2022-05-18T04:59:40.0437509Z ---------------------------------------------------------------------- 2022-05-18T04:59:41.6868066Z test_gather_checks (__main__.ProcessGroupNCCLTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:59:41.7219995Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 61639 2022-05-18T04:59:41.7328514Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 61640 2022-05-18T04:59:42.6543324Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:59:42.6545343Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:59:42.7020787Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:59:42.7024160Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:59:42.7024982Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:59:42.7053915Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:59:44.3402632Z ok (4.296s) 2022-05-18T04:59:44.3402899Z 2022-05-18T04:59:44.3403435Z ---------------------------------------------------------------------- 2022-05-18T04:59:44.3403793Z Ran 1 test in 4.297s 2022-05-18T04:59:44.3403961Z 2022-05-18T04:59:44.3404055Z OK 2022-05-18T04:59:44.3404190Z 2022-05-18T04:59:44.3404327Z Generating XML reports... 2022-05-18T04:59:44.3446991Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLTest-20220518045940.xml 2022-05-18T04:59:45.5305981Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:59:45.5320927Z 2022-05-18T04:59:45.5321091Z Running tests... 2022-05-18T04:59:45.5321803Z ---------------------------------------------------------------------- 2022-05-18T04:59:47.1609650Z test_gather_ops (__main__.ProcessGroupNCCLTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:59:47.1961623Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 61754 2022-05-18T04:59:47.2070328Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 61755 2022-05-18T04:59:48.1403593Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:59:48.1406078Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:59:48.1524628Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:59:48.1527910Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:59:48.1528740Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:59:48.1611001Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:59:50.9172241Z ok (5.385s) 2022-05-18T04:59:50.9172486Z 2022-05-18T04:59:50.9172888Z ---------------------------------------------------------------------- 2022-05-18T04:59:50.9173275Z Ran 1 test in 5.385s 2022-05-18T04:59:50.9173448Z 2022-05-18T04:59:50.9173555Z OK 2022-05-18T04:59:50.9173698Z 2022-05-18T04:59:50.9173840Z Generating XML reports... 2022-05-18T04:59:50.9218118Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLTest-20220518045945.xml 2022-05-18T04:59:52.0937957Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T04:59:52.0953140Z 2022-05-18T04:59:52.0953427Z Running tests... 2022-05-18T04:59:52.0953879Z ---------------------------------------------------------------------- 2022-05-18T04:59:53.6971639Z test_gather_stress (__main__.ProcessGroupNCCLTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T04:59:53.7330822Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 61877 2022-05-18T04:59:53.7443457Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 61878 2022-05-18T04:59:54.6782318Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T04:59:54.6782862Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T04:59:54.6785175Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T04:59:54.6785909Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T04:59:54.6786739Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:59:54.6787442Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T04:59:59.5583913Z ok (7.463s) 2022-05-18T04:59:59.5584186Z 2022-05-18T04:59:59.5584600Z ---------------------------------------------------------------------- 2022-05-18T04:59:59.5584958Z Ran 1 test in 7.463s 2022-05-18T04:59:59.5585147Z 2022-05-18T04:59:59.5585244Z OK 2022-05-18T04:59:59.5585382Z 2022-05-18T04:59:59.5585499Z Generating XML reports... 2022-05-18T04:59:59.5629321Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLTest-20220518045952.xml 2022-05-18T05:00:00.7564396Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T05:00:00.7579533Z 2022-05-18T05:00:00.7579763Z Running tests... 2022-05-18T05:00:00.7580435Z ---------------------------------------------------------------------- 2022-05-18T05:00:02.4190041Z test_reduce_ops (__main__.ProcessGroupNCCLTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:00:02.4561591Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 62000 2022-05-18T05:00:02.4676297Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 62001 2022-05-18T05:00:03.4294237Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:00:03.4296672Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:00:03.4394100Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:00:03.4397596Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:00:03.4398814Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:00:03.4399537Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:00:06.1777864Z ok (5.419s) 2022-05-18T05:00:06.1778262Z 2022-05-18T05:00:06.1778897Z ---------------------------------------------------------------------- 2022-05-18T05:00:06.1779545Z Ran 1 test in 5.420s 2022-05-18T05:00:06.1779846Z 2022-05-18T05:00:06.1780018Z OK 2022-05-18T05:00:06.1780264Z 2022-05-18T05:00:06.1780528Z Generating XML reports... 2022-05-18T05:00:06.1824571Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLTest-20220518050000.xml 2022-05-18T05:00:07.3714343Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T05:00:07.3729415Z 2022-05-18T05:00:07.3729966Z Running tests... 2022-05-18T05:00:07.3730591Z ---------------------------------------------------------------------- 2022-05-18T05:00:09.0248568Z test_reduce_scatter_base_basics (__main__.ProcessGroupNCCLTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:00:09.0598423Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 62123 2022-05-18T05:00:09.0707602Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 62124 2022-05-18T05:00:09.9194985Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:00:09.9197072Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:00:09.9751131Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:00:09.9754819Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:00:09.9755836Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:00:09.9808251Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:00:11.5789680Z ok (4.206s) 2022-05-18T05:00:11.5789883Z 2022-05-18T05:00:11.5790474Z ---------------------------------------------------------------------- 2022-05-18T05:00:11.5790835Z Ran 1 test in 4.206s 2022-05-18T05:00:11.5791010Z 2022-05-18T05:00:11.5791116Z OK 2022-05-18T05:00:11.5791254Z 2022-05-18T05:00:11.5791372Z Generating XML reports... 2022-05-18T05:00:11.5834150Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLTest-20220518050007.xml 2022-05-18T05:00:12.7700394Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T05:00:12.7715337Z 2022-05-18T05:00:12.7715804Z Running tests... 2022-05-18T05:00:12.7716317Z ---------------------------------------------------------------------- 2022-05-18T05:00:14.4229410Z test_reduce_scatter_base_ops (__main__.ProcessGroupNCCLTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:00:14.4592255Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 62238 2022-05-18T05:00:14.4702972Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 62239 2022-05-18T05:00:15.4122166Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:00:15.4124321Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:00:15.4467002Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:00:15.4470580Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:00:15.4471416Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:00:15.4532922Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:00:18.0798994Z ok (5.308s) 2022-05-18T05:00:18.0799377Z 2022-05-18T05:00:18.0799817Z ---------------------------------------------------------------------- 2022-05-18T05:00:18.0800152Z Ran 1 test in 5.308s 2022-05-18T05:00:18.0800324Z 2022-05-18T05:00:18.0800421Z OK 2022-05-18T05:00:18.0800560Z 2022-05-18T05:00:18.0800696Z Generating XML reports... 2022-05-18T05:00:18.0842677Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLTest-20220518050012.xml 2022-05-18T05:00:19.2656616Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T05:00:19.2671740Z 2022-05-18T05:00:19.2672265Z Running tests... 2022-05-18T05:00:19.2672790Z ---------------------------------------------------------------------- 2022-05-18T05:00:20.9208679Z test_reduce_scatter_ops (__main__.ProcessGroupNCCLTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:00:20.9560138Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 62361 2022-05-18T05:00:20.9667847Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 62362 2022-05-18T05:00:21.8782541Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:00:21.8784768Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:00:21.8801952Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:00:21.8805739Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:00:21.8806659Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:00:21.8887842Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:00:24.5766331Z ok (5.309s) 2022-05-18T05:00:24.5766721Z 2022-05-18T05:00:24.5767150Z ---------------------------------------------------------------------- 2022-05-18T05:00:24.5767482Z Ran 1 test in 5.309s 2022-05-18T05:00:24.5767657Z 2022-05-18T05:00:24.5767757Z OK 2022-05-18T05:00:24.5767894Z 2022-05-18T05:00:24.5768034Z Generating XML reports... 2022-05-18T05:00:24.5811124Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLTest-20220518050019.xml 2022-05-18T05:00:25.7621879Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T05:00:25.7636449Z 2022-05-18T05:00:25.7636897Z Running tests... 2022-05-18T05:00:25.7637403Z ---------------------------------------------------------------------- 2022-05-18T05:00:27.4200893Z test_scatter_checks (__main__.ProcessGroupNCCLTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:00:27.4562999Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 62484 2022-05-18T05:00:27.4673697Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 62485 2022-05-18T05:00:28.3985275Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:00:28.3987728Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:00:28.4227538Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:00:28.4231754Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:00:28.4232576Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:00:28.4294240Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:00:29.9747076Z ok (4.211s) 2022-05-18T05:00:29.9747414Z 2022-05-18T05:00:29.9747977Z ---------------------------------------------------------------------- 2022-05-18T05:00:29.9748313Z Ran 1 test in 4.211s 2022-05-18T05:00:29.9748483Z 2022-05-18T05:00:29.9748580Z OK 2022-05-18T05:00:29.9748718Z 2022-05-18T05:00:29.9751865Z Generating XML reports... 2022-05-18T05:00:29.9791020Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLTest-20220518050025.xml 2022-05-18T05:00:31.1588235Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T05:00:31.1604404Z 2022-05-18T05:00:31.1604905Z Running tests... 2022-05-18T05:00:31.1605400Z ---------------------------------------------------------------------- 2022-05-18T05:00:32.8098041Z test_scatter_ops (__main__.ProcessGroupNCCLTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:00:32.8459454Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 62599 2022-05-18T05:00:32.8571814Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 62600 2022-05-18T05:00:33.7612613Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:00:33.7615278Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:00:33.7905084Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:00:33.7908403Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:00:33.7909443Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:00:33.7921451Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:00:36.4670459Z ok (5.306s) 2022-05-18T05:00:36.4670863Z 2022-05-18T05:00:36.4671548Z ---------------------------------------------------------------------- 2022-05-18T05:00:36.4672589Z Ran 1 test in 5.307s 2022-05-18T05:00:36.4672950Z 2022-05-18T05:00:36.4673128Z OK 2022-05-18T05:00:36.4673379Z 2022-05-18T05:00:36.4673638Z Generating XML reports... 2022-05-18T05:00:36.4718247Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLTest-20220518050031.xml 2022-05-18T05:00:37.6481032Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T05:00:37.6496541Z 2022-05-18T05:00:37.6496788Z Running tests... 2022-05-18T05:00:37.6497465Z ---------------------------------------------------------------------- 2022-05-18T05:00:39.3005231Z test_scatter_stress (__main__.ProcessGroupNCCLTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:00:39.3365791Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 62722 2022-05-18T05:00:39.3474997Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 62723 2022-05-18T05:00:40.2814119Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:00:40.2816848Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:00:40.3050482Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:00:40.3054611Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:00:40.3055414Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:00:40.3123724Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:00:45.2615602Z ok (7.612s) 2022-05-18T05:00:45.2615865Z 2022-05-18T05:00:45.2616277Z ---------------------------------------------------------------------- 2022-05-18T05:00:45.2616622Z Ran 1 test in 7.612s 2022-05-18T05:00:45.2616798Z 2022-05-18T05:00:45.2616913Z OK 2022-05-18T05:00:45.2617052Z 2022-05-18T05:00:45.2617171Z Generating XML reports... 2022-05-18T05:00:45.2660356Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLTest-20220518050037.xml 2022-05-18T05:00:46.4554710Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T05:00:46.4569689Z 2022-05-18T05:00:46.4570181Z Running tests... 2022-05-18T05:00:46.4570780Z ---------------------------------------------------------------------- 2022-05-18T05:00:48.1184535Z test_common_errors (__main__.RendezvousEnvTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:00:48.1327314Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:00:48.1328271Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2022-05-18T05:00:48.1352424Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:00:48.1353316Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2022-05-18T05:00:48.1374157Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:00:48.1375330Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2022-05-18T05:00:48.1394986Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:00:48.1396248Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2022-05-18T05:00:48.1474487Z ok (1.690s) 2022-05-18T05:00:48.1475136Z 2022-05-18T05:00:48.1475614Z ---------------------------------------------------------------------- 2022-05-18T05:00:48.1476329Z Ran 1 test in 1.691s 2022-05-18T05:00:48.1476656Z 2022-05-18T05:00:48.1476915Z OK 2022-05-18T05:00:48.1477071Z 2022-05-18T05:00:48.1477203Z Generating XML reports... 2022-05-18T05:00:48.1509947Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-RendezvousEnvTest-20220518050046.xml 2022-05-18T05:00:49.2932794Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_nccl 2022-05-18T05:00:49.2947718Z 2022-05-18T05:00:49.2947906Z Running tests... 2022-05-18T05:00:49.2948340Z ---------------------------------------------------------------------- 2022-05-18T05:00:50.9582747Z test_default_store_timeout_nccl (__main__.TimeoutTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:00:50.9714182Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:00:50.9714963Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2022-05-18T05:00:52.9864207Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:00:52.9865268Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2022-05-18T05:00:53.9975245Z ok (4.702s) 2022-05-18T05:00:53.9975460Z 2022-05-18T05:00:53.9975835Z ---------------------------------------------------------------------- 2022-05-18T05:00:53.9976196Z Ran 1 test in 4.703s 2022-05-18T05:00:53.9976455Z 2022-05-18T05:00:53.9976636Z OK 2022-05-18T05:00:53.9976882Z 2022-05-18T05:00:53.9977034Z Generating XML reports... 2022-05-18T05:00:54.0017605Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_nccl/TEST-TimeoutTest-20220518050049.xml 2022-05-18T05:00:54.3933030Z Running distributed/fsdp/test_fsdp_mixed_precision ... [2022-05-18 05:00:54.392711] 2022-05-18T05:00:54.3933844Z Executing ['/opt/conda/bin/python', 'distributed/fsdp/test_fsdp_mixed_precision.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2022-05-18 05:00:54.392816] 2022-05-18T05:00:56.9391521Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:00:56.9451588Z Test results will be stored in test-reports/python-unittest/distributed.fsdp.test_fsdp_mixed_precision 2022-05-18T05:00:56.9482339Z 2022-05-18T05:00:56.9482597Z Running tests... 2022-05-18T05:00:56.9483046Z ---------------------------------------------------------------------- 2022-05-18T05:00:56.9859813Z test_mixed_precision_e2e_full_shard_mp_diff_buffer_reduce_offload_false_prefetch_post_fp32_none (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 62941 2022-05-18T05:00:56.9971008Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 62942 2022-05-18T05:00:59.6031593Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:00:59.6058533Z dist init r=0, world=2 2022-05-18T05:00:59.6063275Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:00:59.6186335Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:00:59.6214458Z dist init r=1, world=2 2022-05-18T05:00:59.6219217Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:00:59.6220138Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:00:59.6268200Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:01:00.6477758Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:01:00.6478598Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:01:01.5087458Z ok (4.560s) 2022-05-18T05:01:01.5218574Z test_mixed_precision_e2e_full_shard_mp_diff_buffer_reduce_offload_false_prefetch_post_fp32_sharded_grad_scaler (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 63029 2022-05-18T05:01:01.5325679Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 63030 2022-05-18T05:01:04.1611802Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:01:04.1638646Z dist init r=1, world=2 2022-05-18T05:01:04.1643994Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:01:04.1881394Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:01:04.1911497Z dist init r=0, world=2 2022-05-18T05:01:04.1916810Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:01:04.1917823Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:01:04.1950276Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:01:05.2260675Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:01:05.2261248Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:01:06.0440764Z ok (4.535s) 2022-05-18T05:01:06.0570679Z test_mixed_precision_e2e_full_shard_mp_diff_buffer_reduce_offload_false_prefetch_post_fp64_none (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 63117 2022-05-18T05:01:06.0677536Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 63118 2022-05-18T05:01:08.6774373Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:01:08.6777417Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:01:08.6805496Z dist init r=0, world=2 2022-05-18T05:01:08.6805766Z dist init r=1, world=2 2022-05-18T05:01:08.6811210Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:01:08.6811728Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:01:08.6812533Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:01:08.6813237Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:01:09.7211101Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:01:09.7211651Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:01:10.4791281Z ok (4.435s) 2022-05-18T05:01:10.4923312Z test_mixed_precision_e2e_full_shard_mp_diff_buffer_reduce_offload_false_prefetch_post_fp64_sharded_grad_scaler (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 63205 2022-05-18T05:01:10.5029162Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 63206 2022-05-18T05:01:13.1073723Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:01:13.1105508Z dist init r=0, world=2 2022-05-18T05:01:13.1110372Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:01:13.1268768Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:01:13.1298553Z dist init r=1, world=2 2022-05-18T05:01:13.1303367Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:01:13.1304637Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:01:13.1315430Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:01:14.1664022Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:01:14.1664567Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:01:15.0141290Z ok (4.535s) 2022-05-18T05:01:15.0270442Z test_mixed_precision_e2e_full_shard_mp_diff_buffer_reduce_offload_false_prefetch_pre_fp32_none (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 63293 2022-05-18T05:01:15.0377617Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 63294 2022-05-18T05:01:17.6184880Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:01:17.6214739Z dist init r=0, world=2 2022-05-18T05:01:17.6220357Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:01:17.6258230Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:01:17.6285946Z dist init r=1, world=2 2022-05-18T05:01:17.6290476Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:01:17.6291709Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:01:17.6324113Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:01:18.6606341Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:01:18.6606890Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:01:19.5506107Z ok (4.536s) 2022-05-18T05:01:19.5644432Z test_mixed_precision_e2e_full_shard_mp_diff_buffer_reduce_offload_false_prefetch_pre_fp32_sharded_grad_scaler (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 63381 2022-05-18T05:01:19.5759791Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 63382 2022-05-18T05:01:22.1718664Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:01:22.1746325Z dist init r=1, world=2 2022-05-18T05:01:22.1751491Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:01:22.1911597Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:01:22.1941529Z dist init r=0, world=2 2022-05-18T05:01:22.1946394Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:01:22.1947313Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:01:22.1956617Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:01:23.2075511Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:01:23.2076511Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:01:23.9873478Z ok (4.437s) 2022-05-18T05:01:24.0003894Z test_mixed_precision_e2e_full_shard_mp_diff_buffer_reduce_offload_false_prefetch_pre_fp64_none (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 63469 2022-05-18T05:01:24.0112151Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 63470 2022-05-18T05:01:26.6292390Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:01:26.6319579Z dist init r=0, world=2 2022-05-18T05:01:26.6324948Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:01:26.6392369Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:01:26.6422240Z dist init r=1, world=2 2022-05-18T05:01:26.6427372Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:01:26.6429863Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:01:26.6430615Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:01:27.6785436Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:01:27.6786009Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:01:28.5226638Z ok (4.535s) 2022-05-18T05:01:28.5355403Z test_mixed_precision_e2e_full_shard_mp_diff_buffer_reduce_offload_false_prefetch_pre_fp64_sharded_grad_scaler (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 63557 2022-05-18T05:01:28.5462443Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 63558 2022-05-18T05:01:31.1488884Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:01:31.1517024Z dist init r=0, world=2 2022-05-18T05:01:31.1523190Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:01:31.1579115Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:01:31.1608699Z dist init r=1, world=2 2022-05-18T05:01:31.1614250Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:01:31.1615655Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:01:31.1626595Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:01:32.1970649Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:01:32.1971703Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:01:33.0577140Z ok (4.535s) 2022-05-18T05:01:33.0711015Z test_mixed_precision_e2e_full_shard_mp_diff_buffer_reduce_offload_true_prefetch_post_fp32_none (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 63645 2022-05-18T05:01:33.0818473Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 63646 2022-05-18T05:01:35.6763275Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:01:35.6790158Z dist init r=0, world=2 2022-05-18T05:01:35.6795407Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:01:35.6858581Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:01:35.6887762Z dist init r=1, world=2 2022-05-18T05:01:35.6893355Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:01:35.6894530Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:01:35.6898585Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:01:36.7427751Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:01:36.7428311Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:01:37.5934155Z ok (4.535s) 2022-05-18T05:01:37.6073674Z test_mixed_precision_e2e_full_shard_mp_diff_buffer_reduce_offload_true_prefetch_post_fp32_sharded_grad_scaler (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 63733 2022-05-18T05:01:37.6189009Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 63734 2022-05-18T05:01:40.2390591Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:01:40.2417768Z dist init r=0, world=2 2022-05-18T05:01:40.2422183Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:01:40.2585164Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:01:40.2614578Z dist init r=1, world=2 2022-05-18T05:01:40.2619968Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:01:40.2621612Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:01:40.2627177Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:01:41.3002777Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:01:41.3003317Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:01:42.1303514Z ok (4.537s) 2022-05-18T05:01:42.1436279Z test_mixed_precision_e2e_full_shard_mp_diff_buffer_reduce_offload_true_prefetch_post_fp64_none (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 63821 2022-05-18T05:01:42.1543883Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 63822 2022-05-18T05:01:44.7613564Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:01:44.7640941Z dist init r=1, world=2 2022-05-18T05:01:44.7646484Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:01:44.7659677Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:01:44.7688937Z dist init r=0, world=2 2022-05-18T05:01:44.7694094Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:01:44.7695311Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:01:44.7750256Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:01:45.7863514Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:01:45.7864064Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:01:46.5656994Z ok (4.435s) 2022-05-18T05:01:46.5788894Z test_mixed_precision_e2e_full_shard_mp_diff_buffer_reduce_offload_true_prefetch_post_fp64_sharded_grad_scaler (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 63909 2022-05-18T05:01:46.5896914Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 63910 2022-05-18T05:01:49.1847686Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:01:49.1875433Z dist init r=0, world=2 2022-05-18T05:01:49.1880507Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:01:49.2222860Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:01:49.2251350Z dist init r=1, world=2 2022-05-18T05:01:49.2257369Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:01:49.2258586Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:01:49.2289548Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:01:50.2786089Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:01:50.2787139Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:01:51.1008127Z ok (4.535s) 2022-05-18T05:01:51.1139577Z test_mixed_precision_e2e_full_shard_mp_diff_buffer_reduce_offload_true_prefetch_pre_fp32_none (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 63997 2022-05-18T05:01:51.1246818Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 63998 2022-05-18T05:01:53.7482216Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:01:53.7505444Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:01:53.7509183Z dist init r=0, world=2 2022-05-18T05:01:53.7514485Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:01:53.7539126Z dist init r=1, world=2 2022-05-18T05:01:53.7544125Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:01:53.7545576Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:01:53.7618393Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:01:54.7955061Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:01:54.7956108Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:01:55.6360286Z ok (4.535s) 2022-05-18T05:01:55.6496115Z test_mixed_precision_e2e_full_shard_mp_diff_buffer_reduce_offload_true_prefetch_pre_fp32_sharded_grad_scaler (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 64085 2022-05-18T05:01:55.6606064Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 64086 2022-05-18T05:01:58.2101535Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:01:58.2129136Z dist init r=0, world=2 2022-05-18T05:01:58.2135169Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:01:58.2436779Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:01:58.2466024Z dist init r=1, world=2 2022-05-18T05:01:58.2470613Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:01:58.2471446Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:01:58.2544515Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:01:59.2851869Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:01:59.2852913Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:02:00.1718633Z ok (4.536s) 2022-05-18T05:02:00.1850323Z test_mixed_precision_e2e_full_shard_mp_diff_buffer_reduce_offload_true_prefetch_pre_fp64_none (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 64173 2022-05-18T05:02:00.1959341Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 64174 2022-05-18T05:02:02.8285637Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:02:02.8313141Z dist init r=0, world=2 2022-05-18T05:02:02.8318235Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:02:02.8349364Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:02:02.8378900Z dist init r=1, world=2 2022-05-18T05:02:02.8383496Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:02:02.8384623Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:02:02.8421760Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:02:03.8782580Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:02:03.8783120Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:02:04.7074626Z ok (4.535s) 2022-05-18T05:02:04.7208143Z test_mixed_precision_e2e_full_shard_mp_diff_buffer_reduce_offload_true_prefetch_pre_fp64_sharded_grad_scaler (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 64261 2022-05-18T05:02:04.7316407Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 64262 2022-05-18T05:02:07.3440571Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:02:07.3467920Z dist init r=0, world=2 2022-05-18T05:02:07.3472534Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:02:07.3529154Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:02:07.3558537Z dist init r=1, world=2 2022-05-18T05:02:07.3563608Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:02:07.3564687Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:02:07.3575637Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:02:08.4053209Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:02:08.4053762Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:02:09.2428875Z ok (4.535s) 2022-05-18T05:02:09.2558477Z test_mixed_precision_e2e_full_shard_mp_fp16_offload_false_prefetch_post_fp32_none (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 64349 2022-05-18T05:02:09.2665753Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 64350 2022-05-18T05:02:11.8698836Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:02:11.8725450Z dist init r=1, world=2 2022-05-18T05:02:11.8730498Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:02:11.8919896Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:02:11.8949010Z dist init r=0, world=2 2022-05-18T05:02:11.8953436Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:02:11.8954544Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:02:11.9037325Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:02:12.9266681Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:02:12.9267228Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:02:13.6780334Z ok (4.435s) 2022-05-18T05:02:13.6910168Z test_mixed_precision_e2e_full_shard_mp_fp16_offload_false_prefetch_post_fp32_sharded_grad_scaler (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 64437 2022-05-18T05:02:13.7018067Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 64438 2022-05-18T05:02:16.2961619Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:02:16.2988536Z dist init r=1, world=2 2022-05-18T05:02:16.2993607Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:02:16.3295510Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:02:16.3324107Z dist init r=0, world=2 2022-05-18T05:02:16.3328450Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:02:16.3329586Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:02:16.3401908Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:02:17.3498541Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:02:17.3499404Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:02:18.2133793Z ok (4.535s) 2022-05-18T05:02:18.2266897Z test_mixed_precision_e2e_full_shard_mp_fp16_offload_false_prefetch_post_fp64_none (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 64525 2022-05-18T05:02:18.2376007Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 64526 2022-05-18T05:02:20.8936713Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:02:20.8963549Z dist init r=0, world=2 2022-05-18T05:02:20.8968537Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:02:20.9114943Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:02:20.9144504Z dist init r=1, world=2 2022-05-18T05:02:20.9149060Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:02:20.9150037Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:02:20.9173842Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:02:21.9551836Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:02:21.9552373Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:02:22.7489786Z ok (4.535s) 2022-05-18T05:02:22.7621189Z test_mixed_precision_e2e_full_shard_mp_fp16_offload_false_prefetch_post_fp64_sharded_grad_scaler (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 64613 2022-05-18T05:02:22.7728011Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 64614 2022-05-18T05:02:25.3804223Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:02:25.3830972Z dist init r=0, world=2 2022-05-18T05:02:25.3836071Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:02:25.3973717Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:02:25.4003311Z dist init r=1, world=2 2022-05-18T05:02:25.4007962Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:02:25.4009513Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:02:25.4041617Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:02:26.4454336Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:02:26.4454864Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:02:27.2841600Z ok (4.535s) 2022-05-18T05:02:27.2973713Z test_mixed_precision_e2e_full_shard_mp_fp16_offload_false_prefetch_pre_fp32_none (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 64701 2022-05-18T05:02:27.3083148Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 64702 2022-05-18T05:02:29.9624397Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:02:29.9651556Z dist init r=0, world=2 2022-05-18T05:02:29.9656778Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:02:29.9694996Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:02:29.9723882Z dist init r=1, world=2 2022-05-18T05:02:29.9729347Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:02:29.9731833Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:02:29.9759928Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:02:31.0233167Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:02:31.0233732Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:02:31.8200644Z ok (4.536s) 2022-05-18T05:02:31.8336735Z test_mixed_precision_e2e_full_shard_mp_fp16_offload_false_prefetch_pre_fp32_sharded_grad_scaler (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 64789 2022-05-18T05:02:31.8446731Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 64790 2022-05-18T05:02:34.4604038Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:02:34.4631270Z dist init r=0, world=2 2022-05-18T05:02:34.4636550Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:02:34.4740589Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:02:34.4769150Z dist init r=1, world=2 2022-05-18T05:02:34.4773753Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:02:34.4774556Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:02:34.4841205Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:02:35.5137507Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:02:35.5138027Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:02:36.2557488Z ok (4.436s) 2022-05-18T05:02:36.2686109Z test_mixed_precision_e2e_full_shard_mp_fp16_offload_false_prefetch_pre_fp64_none (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 64877 2022-05-18T05:02:36.2793124Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 64878 2022-05-18T05:02:38.8763507Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:02:38.8790851Z dist init r=1, world=2 2022-05-18T05:02:38.8796022Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:02:38.8977449Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:02:38.9006325Z dist init r=0, world=2 2022-05-18T05:02:38.9011153Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:02:38.9012244Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:02:38.9103054Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:02:39.9343336Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:02:39.9343882Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:02:40.6906629Z ok (4.435s) 2022-05-18T05:02:40.7039245Z test_mixed_precision_e2e_full_shard_mp_fp16_offload_false_prefetch_pre_fp64_sharded_grad_scaler (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 64965 2022-05-18T05:02:40.7146916Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 64966 2022-05-18T05:02:43.3036578Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:02:43.3065201Z dist init r=0, world=2 2022-05-18T05:02:43.3070440Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:02:43.3174240Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:02:43.3203129Z dist init r=1, world=2 2022-05-18T05:02:43.3208033Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:02:43.3209409Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:02:43.3275462Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:02:44.3637196Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:02:44.3638386Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:02:45.2262695Z ok (4.535s) 2022-05-18T05:02:45.2394901Z test_mixed_precision_e2e_full_shard_mp_fp16_offload_true_prefetch_post_fp32_none (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 65053 2022-05-18T05:02:45.2504735Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 65054 2022-05-18T05:02:47.8465769Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:02:47.8494023Z dist init r=0, world=2 2022-05-18T05:02:47.8498759Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:02:47.8763564Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:02:47.8792705Z dist init r=1, world=2 2022-05-18T05:02:47.8797797Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:02:47.8798748Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:02:47.8805934Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:02:48.9183880Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:02:48.9184925Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:02:49.7620795Z ok (4.536s) 2022-05-18T05:02:49.7751246Z test_mixed_precision_e2e_full_shard_mp_fp16_offload_true_prefetch_post_fp32_sharded_grad_scaler (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 65141 2022-05-18T05:02:49.7860686Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 65142 2022-05-18T05:02:52.3904819Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:02:52.3931749Z dist init r=1, world=2 2022-05-18T05:02:52.3936940Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:02:52.4078645Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:02:52.4107329Z dist init r=0, world=2 2022-05-18T05:02:52.4112072Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:02:52.4113198Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:02:52.4141786Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:02:53.4308784Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:02:53.4309363Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:02:54.2977605Z ok (4.536s) 2022-05-18T05:02:54.3107451Z test_mixed_precision_e2e_full_shard_mp_fp16_offload_true_prefetch_post_fp64_none (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 65229 2022-05-18T05:02:54.3216246Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 65230 2022-05-18T05:02:56.9015829Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:02:56.9042111Z dist init r=1, world=2 2022-05-18T05:02:56.9047210Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:02:56.9388624Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:02:56.9417671Z dist init r=0, world=2 2022-05-18T05:02:56.9422271Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:02:56.9423410Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:02:56.9455592Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:02:57.9545052Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:02:57.9545590Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:02:58.7328678Z ok (4.435s) 2022-05-18T05:02:58.7461264Z test_mixed_precision_e2e_full_shard_mp_fp16_offload_true_prefetch_post_fp64_sharded_grad_scaler (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 65317 2022-05-18T05:02:58.7569238Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 65318 2022-05-18T05:03:01.3547679Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:03:01.3575041Z dist init r=1, world=2 2022-05-18T05:03:01.3580026Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:03:01.3908023Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:03:01.3937580Z dist init r=0, world=2 2022-05-18T05:03:01.3942208Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:03:01.3943273Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:03:01.3988352Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:03:02.4336156Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:03:02.4336751Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:03:03.2684567Z ok (4.535s) 2022-05-18T05:03:03.2816743Z test_mixed_precision_e2e_full_shard_mp_fp16_offload_true_prefetch_pre_fp32_none (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 65405 2022-05-18T05:03:03.2925242Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 65406 2022-05-18T05:03:05.9026060Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:03:05.9052363Z dist init r=0, world=2 2022-05-18T05:03:05.9057620Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:03:05.9573615Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:03:05.9602574Z dist init r=1, world=2 2022-05-18T05:03:05.9607478Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:03:05.9608310Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:03:05.9669187Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:03:06.9985108Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:03:06.9985661Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:03:07.8041224Z ok (4.536s) 2022-05-18T05:03:07.8171869Z test_mixed_precision_e2e_full_shard_mp_fp16_offload_true_prefetch_pre_fp32_sharded_grad_scaler (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 65493 2022-05-18T05:03:07.8279889Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 65494 2022-05-18T05:03:10.3724465Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:03:10.3753220Z dist init r=0, world=2 2022-05-18T05:03:10.3758095Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:03:10.4101209Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:03:10.4130486Z dist init r=1, world=2 2022-05-18T05:03:10.4135707Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:03:10.4136739Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:03:10.4166007Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:03:11.4535164Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:03:11.4535725Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:03:12.2396951Z ok (4.435s) 2022-05-18T05:03:12.2527525Z test_mixed_precision_e2e_full_shard_mp_fp16_offload_true_prefetch_pre_fp64_none (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 65581 2022-05-18T05:03:12.2635500Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 65582 2022-05-18T05:03:14.8842878Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:03:14.8881286Z dist init r=0, world=2 2022-05-18T05:03:14.8886131Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:03:14.8942030Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:03:14.8971238Z dist init r=1, world=2 2022-05-18T05:03:14.8976175Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:03:14.8976994Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:03:14.8989248Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:03:15.9304212Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:03:16.7751236Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:03:16.7751662Z ok (4.535s) 2022-05-18T05:03:16.7888534Z test_mixed_precision_e2e_full_shard_mp_fp16_offload_true_prefetch_pre_fp64_sharded_grad_scaler (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 65669 2022-05-18T05:03:16.7999357Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 65670 2022-05-18T05:03:19.4196743Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:03:19.4224246Z dist init r=0, world=2 2022-05-18T05:03:19.4229309Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:03:19.4315516Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:03:19.4345439Z dist init r=1, world=2 2022-05-18T05:03:19.4350608Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:03:19.4353381Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:03:19.4434810Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:03:20.4745044Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:03:20.4745569Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:03:21.3112767Z ok (4.536s) 2022-05-18T05:03:21.3243100Z test_mixed_precision_e2e_full_shard_mp_no_mp_offload_false_prefetch_post_fp32_none (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 65757 2022-05-18T05:03:21.3351719Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 65758 2022-05-18T05:03:23.9337914Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:03:23.9364604Z dist init r=1, world=2 2022-05-18T05:03:23.9369770Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:03:23.9544092Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:03:23.9573385Z dist init r=0, world=2 2022-05-18T05:03:23.9578010Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:03:23.9579200Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:03:23.9675987Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:03:24.9755604Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:03:24.9756176Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:03:25.7463151Z ok (4.435s) 2022-05-18T05:03:25.7594379Z test_mixed_precision_e2e_full_shard_mp_no_mp_offload_false_prefetch_post_fp32_sharded_grad_scaler (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 65845 2022-05-18T05:03:25.7702371Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 65846 2022-05-18T05:03:28.4030066Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:03:28.4056890Z dist init r=0, world=2 2022-05-18T05:03:28.4061888Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:03:28.4134360Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:03:28.4162882Z dist init r=1, world=2 2022-05-18T05:03:28.4167538Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:03:28.4168490Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:03:28.4267352Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:03:29.4551753Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:03:29.4552297Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:03:30.2818671Z ok (4.535s) 2022-05-18T05:03:30.2949573Z test_mixed_precision_e2e_full_shard_mp_no_mp_offload_false_prefetch_post_fp64_none (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 65933 2022-05-18T05:03:30.3057627Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 65934 2022-05-18T05:03:32.9011545Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:03:32.9039079Z dist init r=0, world=2 2022-05-18T05:03:32.9044390Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:03:32.9070081Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:03:32.9098342Z dist init r=1, world=2 2022-05-18T05:03:32.9102915Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:03:32.9103955Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:03:32.9147848Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:03:33.9182123Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:03:33.9182660Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:03:34.7188446Z ok (4.437s) 2022-05-18T05:03:34.7321953Z test_mixed_precision_e2e_full_shard_mp_no_mp_offload_false_prefetch_post_fp64_sharded_grad_scaler (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 66021 2022-05-18T05:03:34.7431525Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 66022 2022-05-18T05:03:37.3318013Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:03:37.3344578Z dist init r=0, world=2 2022-05-18T05:03:37.3349328Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:03:37.3535950Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:03:37.3564959Z dist init r=1, world=2 2022-05-18T05:03:37.3569540Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:03:37.3570432Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:03:37.3656263Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:03:38.3936436Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:03:38.3936980Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:03:39.1544599Z ok (4.435s) 2022-05-18T05:03:39.1677707Z test_mixed_precision_e2e_full_shard_mp_no_mp_offload_false_prefetch_pre_fp32_none (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 66109 2022-05-18T05:03:39.1788502Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 66110 2022-05-18T05:03:41.7898396Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:03:41.7924749Z dist init r=0, world=2 2022-05-18T05:03:41.7929924Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:03:41.8090082Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:03:41.8119609Z dist init r=1, world=2 2022-05-18T05:03:41.8124267Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:03:41.8125508Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:03:41.8134233Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:03:42.8691546Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:03:42.8692083Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:03:43.6903419Z ok (4.536s) 2022-05-18T05:03:43.7033991Z test_mixed_precision_e2e_full_shard_mp_no_mp_offload_false_prefetch_pre_fp32_sharded_grad_scaler (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 66197 2022-05-18T05:03:43.7141571Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 66198 2022-05-18T05:03:46.3352556Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:03:46.3380488Z dist init r=0, world=2 2022-05-18T05:03:46.3382315Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:03:46.3385848Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:03:46.3411387Z dist init r=1, world=2 2022-05-18T05:03:46.3416120Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:03:46.3417114Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:03:46.3489474Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:03:47.3978274Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:03:47.3979165Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:03:48.2256631Z ok (4.535s) 2022-05-18T05:03:48.2388338Z test_mixed_precision_e2e_full_shard_mp_no_mp_offload_false_prefetch_pre_fp64_none (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 66285 2022-05-18T05:03:48.2499196Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 66286 2022-05-18T05:03:50.8684595Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:03:50.8711409Z dist init r=0, world=2 2022-05-18T05:03:50.8716732Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:03:50.8752447Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:03:50.8786479Z dist init r=1, world=2 2022-05-18T05:03:50.8791160Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:03:50.8792256Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:03:50.8820072Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:03:51.9261474Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:03:51.9262453Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:03:52.6612790Z ok (4.435s) 2022-05-18T05:03:52.6750154Z test_mixed_precision_e2e_full_shard_mp_no_mp_offload_false_prefetch_pre_fp64_sharded_grad_scaler (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 66373 2022-05-18T05:03:52.6856961Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 66374 2022-05-18T05:03:55.3072219Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:03:55.3099497Z dist init r=0, world=2 2022-05-18T05:03:55.3104275Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:03:55.3151799Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:03:55.3185847Z dist init r=1, world=2 2022-05-18T05:03:55.3191312Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:03:55.3192671Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:03:55.3207463Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:03:56.3652033Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:03:56.3652576Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:03:57.1971511Z ok (4.536s) 2022-05-18T05:03:57.2104571Z test_mixed_precision_e2e_full_shard_mp_no_mp_offload_true_prefetch_post_fp32_none (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 66461 2022-05-18T05:03:57.2215150Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 66462 2022-05-18T05:03:59.8347191Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:03:59.8347903Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:03:59.8376926Z dist init r=0, world=2 2022-05-18T05:03:59.8378627Z dist init r=1, world=2 2022-05-18T05:03:59.8382241Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:03:59.8383717Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:03:59.8384728Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:03:59.8385527Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:04:00.8821315Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:04:00.8821854Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:04:01.6327266Z ok (4.435s) 2022-05-18T05:04:01.6463322Z test_mixed_precision_e2e_full_shard_mp_no_mp_offload_true_prefetch_post_fp32_sharded_grad_scaler (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 66549 2022-05-18T05:04:01.6575594Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 66550 2022-05-18T05:04:04.2689911Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:04:04.2717950Z dist init r=1, world=2 2022-05-18T05:04:04.2723611Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:04:04.2749842Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:04:04.2778917Z dist init r=0, world=2 2022-05-18T05:04:04.2783538Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:04:04.2784431Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:04:04.2827243Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:04:05.3191217Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:04:05.3191775Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:04:06.1693488Z ok (4.536s) 2022-05-18T05:04:06.1824345Z test_mixed_precision_e2e_full_shard_mp_no_mp_offload_true_prefetch_post_fp64_none (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 66637 2022-05-18T05:04:06.1931545Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 66638 2022-05-18T05:04:08.7774925Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:04:08.7801409Z dist init r=0, world=2 2022-05-18T05:04:08.7806529Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:04:08.8294052Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:04:08.8323320Z dist init r=1, world=2 2022-05-18T05:04:08.8328400Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:04:08.8329734Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:04:08.8417739Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:04:09.8937319Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:04:09.8937850Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:04:10.7047508Z ok (4.535s) 2022-05-18T05:04:10.7183340Z test_mixed_precision_e2e_full_shard_mp_no_mp_offload_true_prefetch_post_fp64_sharded_grad_scaler (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 66725 2022-05-18T05:04:10.7292231Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 66726 2022-05-18T05:04:13.3119677Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:04:13.3146093Z dist init r=1, world=2 2022-05-18T05:04:13.3151049Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:04:13.3536846Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:04:13.3565374Z dist init r=0, world=2 2022-05-18T05:04:13.3570275Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:04:13.3571714Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:04:13.3660783Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:04:14.3840630Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:04:14.3841242Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:04:15.1405678Z ok (4.436s) 2022-05-18T05:04:15.1535930Z test_mixed_precision_e2e_full_shard_mp_no_mp_offload_true_prefetch_pre_fp32_none (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 66813 2022-05-18T05:04:15.1643160Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 66814 2022-05-18T05:04:17.7689124Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:04:17.7716405Z dist init r=0, world=2 2022-05-18T05:04:17.7721290Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:04:17.7789129Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:04:17.7818673Z dist init r=1, world=2 2022-05-18T05:04:17.7823537Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:04:17.7824566Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:04:17.7825278Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:04:18.8276128Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:04:18.8276648Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:04:19.6773254Z ok (4.537s) 2022-05-18T05:04:19.6902875Z test_mixed_precision_e2e_full_shard_mp_no_mp_offload_true_prefetch_pre_fp32_sharded_grad_scaler (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 66901 2022-05-18T05:04:19.7010734Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 66902 2022-05-18T05:04:22.3085607Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:04:22.3113089Z dist init r=0, world=2 2022-05-18T05:04:22.3117836Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:04:22.3230474Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:04:22.3259796Z dist init r=1, world=2 2022-05-18T05:04:22.3264367Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:04:22.3265682Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:04:22.3322928Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:04:23.3775752Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:04:23.3776274Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:04:24.2127399Z ok (4.535s) 2022-05-18T05:04:24.2261878Z test_mixed_precision_e2e_full_shard_mp_no_mp_offload_true_prefetch_pre_fp64_none (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 66989 2022-05-18T05:04:24.2373476Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 66990 2022-05-18T05:04:26.8385437Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:04:26.8411893Z dist init r=0, world=2 2022-05-18T05:04:26.8417990Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:04:26.8864655Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:04:26.8894112Z dist init r=1, world=2 2022-05-18T05:04:26.8899164Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:04:26.8900320Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:04:26.8927580Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:04:27.9262540Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:04:27.9263152Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:04:28.7486994Z ok (4.536s) 2022-05-18T05:04:28.7619167Z test_mixed_precision_e2e_full_shard_mp_no_mp_offload_true_prefetch_pre_fp64_sharded_grad_scaler (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 67077 2022-05-18T05:04:28.7727086Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 67078 2022-05-18T05:04:31.3723332Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:04:31.3750800Z dist init r=1, world=2 2022-05-18T05:04:31.3755989Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:04:31.4013862Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:04:31.4043016Z dist init r=0, world=2 2022-05-18T05:04:31.4047807Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:04:31.4048907Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:04:31.4062691Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:04:32.4184530Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:04:32.4185109Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:04:33.2841844Z ok (4.535s) 2022-05-18T05:04:33.2974884Z test_mixed_precision_e2e_full_shard_mp_only_param_and_buf_offload_false_prefetch_post_fp32_none (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 67165 2022-05-18T05:04:33.3085452Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 67166 2022-05-18T05:04:35.8975650Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:04:35.9002013Z dist init r=1, world=2 2022-05-18T05:04:35.9007187Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:04:35.9228219Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:04:35.9256537Z dist init r=0, world=2 2022-05-18T05:04:35.9260886Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:04:35.9261941Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:04:35.9313687Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:04:36.9389724Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:04:36.9390285Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:04:37.7199054Z ok (4.436s) 2022-05-18T05:04:37.7329717Z test_mixed_precision_e2e_full_shard_mp_only_param_and_buf_offload_false_prefetch_post_fp32_sharded_grad_scaler (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 67253 2022-05-18T05:04:37.7436389Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 67254 2022-05-18T05:04:40.3521860Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:04:40.3548886Z dist init r=1, world=2 2022-05-18T05:04:40.3554020Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:04:40.3594966Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:04:40.3624199Z dist init r=0, world=2 2022-05-18T05:04:40.3629810Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:04:40.3630685Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:04:40.3657006Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:04:41.3910163Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:04:41.3910718Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:04:42.2549728Z ok (4.535s) 2022-05-18T05:04:42.2682239Z test_mixed_precision_e2e_full_shard_mp_only_param_and_buf_offload_false_prefetch_post_fp64_none (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 67341 2022-05-18T05:04:42.2790629Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 67342 2022-05-18T05:04:44.9001107Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:04:44.9028294Z dist init r=0, world=2 2022-05-18T05:04:44.9033490Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:04:44.9273514Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:04:44.9303144Z dist init r=1, world=2 2022-05-18T05:04:44.9308022Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:04:44.9309270Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:04:44.9340054Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:04:45.9836724Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:04:45.9837251Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:04:46.7908119Z ok (4.536s) 2022-05-18T05:04:46.8042324Z test_mixed_precision_e2e_full_shard_mp_only_param_and_buf_offload_false_prefetch_post_fp64_sharded_grad_scaler (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 67429 2022-05-18T05:04:46.8150752Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 67430 2022-05-18T05:04:49.4156943Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:04:49.4183767Z dist init r=1, world=2 2022-05-18T05:04:49.4189016Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:04:49.4351809Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:04:49.4381197Z dist init r=0, world=2 2022-05-18T05:04:49.4386057Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:04:49.4387033Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:04:49.4393854Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:04:50.4555021Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:04:50.4555569Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:04:51.3263561Z ok (4.535s) 2022-05-18T05:04:51.3392685Z test_mixed_precision_e2e_full_shard_mp_only_param_and_buf_offload_false_prefetch_pre_fp32_none (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 67517 2022-05-18T05:04:51.3499388Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 67518 2022-05-18T05:04:53.9878930Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:04:53.9902975Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:04:53.9905986Z dist init r=1, world=2 2022-05-18T05:04:53.9911492Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:04:53.9931376Z dist init r=0, world=2 2022-05-18T05:04:53.9936228Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:04:53.9937247Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:04:54.0014242Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:04:55.0338124Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:04:55.0338904Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:04:55.8615868Z ok (4.535s) 2022-05-18T05:04:55.8745631Z test_mixed_precision_e2e_full_shard_mp_only_param_and_buf_offload_false_prefetch_pre_fp32_sharded_grad_scaler (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 67605 2022-05-18T05:04:55.8852324Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 67606 2022-05-18T05:04:58.5091741Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:04:58.5119821Z dist init r=1, world=2 2022-05-18T05:04:58.5125282Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:04:58.5127767Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:04:58.5156121Z dist init r=0, world=2 2022-05-18T05:04:58.5160452Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:04:58.5161571Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:04:58.5228755Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:04:59.5654805Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:04:59.5655514Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:05:00.3967973Z ok (4.535s) 2022-05-18T05:05:00.4097864Z test_mixed_precision_e2e_full_shard_mp_only_param_and_buf_offload_false_prefetch_pre_fp64_none (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 67693 2022-05-18T05:05:00.4204340Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 67694 2022-05-18T05:05:03.0497291Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:05:03.0524969Z dist init r=0, world=2 2022-05-18T05:05:03.0530245Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:05:03.0600339Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:05:03.0628350Z dist init r=1, world=2 2022-05-18T05:05:03.0633139Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:05:03.0635689Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:05:03.0636446Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:05:04.1037928Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:05:04.1038451Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:05:04.9318249Z ok (4.535s) 2022-05-18T05:05:04.9450854Z test_mixed_precision_e2e_full_shard_mp_only_param_and_buf_offload_false_prefetch_pre_fp64_sharded_grad_scaler (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 67781 2022-05-18T05:05:04.9557627Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 67782 2022-05-18T05:05:07.5703507Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:05:07.5731387Z dist init r=1, world=2 2022-05-18T05:05:07.5737142Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:05:07.5754566Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:05:07.5783735Z dist init r=0, world=2 2022-05-18T05:05:07.5788201Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:05:07.5789199Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:05:07.5840507Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:05:08.6374164Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:05:08.6374720Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:05:09.4672671Z ok (4.535s) 2022-05-18T05:05:09.4802171Z test_mixed_precision_e2e_full_shard_mp_only_param_and_buf_offload_true_prefetch_post_fp32_none (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 67869 2022-05-18T05:05:09.4909193Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 67870 2022-05-18T05:05:12.1252998Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:05:12.1281680Z dist init r=0, world=2 2022-05-18T05:05:12.1286306Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:05:12.1346856Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:05:12.1376577Z dist init r=1, world=2 2022-05-18T05:05:12.1381917Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:05:12.1382961Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:05:12.1390047Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:05:13.1866921Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:05:13.1867914Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:05:14.0038842Z ok (4.536s) 2022-05-18T05:05:14.0170261Z test_mixed_precision_e2e_full_shard_mp_only_param_and_buf_offload_true_prefetch_post_fp32_sharded_grad_scaler (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 67957 2022-05-18T05:05:14.0279842Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 67958 2022-05-18T05:05:16.6644668Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:05:16.6680219Z dist init r=1, world=2 2022-05-18T05:05:16.6684914Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:05:16.6756357Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:05:16.6788453Z dist init r=0, world=2 2022-05-18T05:05:16.6792970Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:05:16.6793783Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:05:16.6890048Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:05:17.7416768Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:05:17.7417308Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:05:18.5392104Z ok (4.535s) 2022-05-18T05:05:18.5522095Z test_mixed_precision_e2e_full_shard_mp_only_param_and_buf_offload_true_prefetch_post_fp64_none (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 68045 2022-05-18T05:05:18.5630791Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 68046 2022-05-18T05:05:21.1729072Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:05:21.1754872Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:05:21.1756246Z dist init r=0, world=2 2022-05-18T05:05:21.1761621Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:05:21.1784793Z dist init r=1, world=2 2022-05-18T05:05:21.1789549Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:05:21.1790642Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:05:21.1865138Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:05:22.2179964Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:05:22.2180500Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:05:23.0742895Z ok (4.535s) 2022-05-18T05:05:23.0877518Z test_mixed_precision_e2e_full_shard_mp_only_param_and_buf_offload_true_prefetch_post_fp64_sharded_grad_scaler (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 68133 2022-05-18T05:05:23.0986054Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 68134 2022-05-18T05:05:25.7142250Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:05:25.7181043Z dist init r=0, world=2 2022-05-18T05:05:25.7181535Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:05:25.7263295Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:05:25.7292113Z dist init r=1, world=2 2022-05-18T05:05:25.7297372Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:05:25.7298405Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:05:25.7378487Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:05:26.7727849Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:05:26.7728411Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:05:27.6102966Z ok (4.536s) 2022-05-18T05:05:27.6238500Z test_mixed_precision_e2e_full_shard_mp_only_param_and_buf_offload_true_prefetch_pre_fp32_none (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 68221 2022-05-18T05:05:27.6349588Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 68222 2022-05-18T05:05:30.2515905Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:05:30.2544121Z dist init r=0, world=2 2022-05-18T05:05:30.2549739Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:05:30.2557033Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:05:30.2586739Z dist init r=1, world=2 2022-05-18T05:05:30.2591327Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:05:30.2592447Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:05:30.2653287Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:05:31.3067592Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:05:31.3068442Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:05:32.1464300Z ok (4.536s) 2022-05-18T05:05:32.1600250Z test_mixed_precision_e2e_full_shard_mp_only_param_and_buf_offload_true_prefetch_pre_fp32_sharded_grad_scaler (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 68309 2022-05-18T05:05:32.1711796Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 68310 2022-05-18T05:05:34.7740141Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:05:34.7767255Z dist init r=0, world=2 2022-05-18T05:05:34.7772384Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:05:34.7772816Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:05:34.7801577Z dist init r=1, world=2 2022-05-18T05:05:34.7806635Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:05:34.7807793Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:05:34.7876231Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:05:35.8229738Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:05:35.8230265Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:05:36.6827833Z ok (4.536s) 2022-05-18T05:05:36.6961388Z test_mixed_precision_e2e_full_shard_mp_only_param_and_buf_offload_true_prefetch_pre_fp64_none (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 68397 2022-05-18T05:05:36.7071374Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 68398 2022-05-18T05:05:39.2936227Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:05:39.2967479Z dist init r=0, world=2 2022-05-18T05:05:39.2972808Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:05:39.3283772Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:05:39.3313028Z dist init r=1, world=2 2022-05-18T05:05:39.3317938Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:05:39.3318798Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:05:39.3381826Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:05:40.3815963Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:05:40.3816482Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:05:41.2187117Z ok (4.536s) 2022-05-18T05:05:41.2319788Z test_mixed_precision_e2e_full_shard_mp_only_param_and_buf_offload_true_prefetch_pre_fp64_sharded_grad_scaler (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 68485 2022-05-18T05:05:41.2428070Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 68486 2022-05-18T05:05:43.8240532Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:05:43.8267531Z dist init r=1, world=2 2022-05-18T05:05:43.8272718Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:05:43.8600246Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:05:43.8629243Z dist init r=0, world=2 2022-05-18T05:05:43.8634153Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:05:43.8635003Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:05:43.8681173Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:05:44.8780467Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:05:44.8781065Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:05:45.6558749Z ok (4.437s) 2022-05-18T05:05:45.6688679Z test_mixed_precision_e2e_full_shard_mp_only_reduce_offload_false_prefetch_post_fp32_none (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 68573 2022-05-18T05:05:45.6796538Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 68574 2022-05-18T05:05:48.2925941Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:05:48.2953565Z dist init r=1, world=2 2022-05-18T05:05:48.2958499Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:05:48.3124813Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:05:48.3154625Z dist init r=0, world=2 2022-05-18T05:05:48.3159493Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:05:48.3160608Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:05:48.3163478Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:05:49.3674882Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:05:49.3675546Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:05:50.1911155Z ok (4.535s) 2022-05-18T05:05:50.2044492Z test_mixed_precision_e2e_full_shard_mp_only_reduce_offload_false_prefetch_post_fp32_sharded_grad_scaler (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 68661 2022-05-18T05:05:50.2156116Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 68662 2022-05-18T05:05:52.8113890Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:05:52.8141069Z dist init r=0, world=2 2022-05-18T05:05:52.8146550Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:05:52.8255763Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:05:52.8285134Z dist init r=1, world=2 2022-05-18T05:05:52.8289890Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:05:52.8291482Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:05:52.8351743Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:05:53.8841905Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:05:53.8842542Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:05:54.7272252Z ok (4.536s) 2022-05-18T05:05:54.7404930Z test_mixed_precision_e2e_full_shard_mp_only_reduce_offload_false_prefetch_post_fp64_none (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 68749 2022-05-18T05:05:54.7515898Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 68750 2022-05-18T05:05:57.3599822Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:05:57.3628332Z dist init r=0, world=2 2022-05-18T05:05:57.3633524Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:05:57.3680109Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:05:57.3709169Z dist init r=1, world=2 2022-05-18T05:05:57.3714023Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:05:57.3715378Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:05:57.3736643Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:05:58.4336591Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:05:58.4337619Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:05:59.2631717Z ok (4.536s) 2022-05-18T05:05:59.2767679Z test_mixed_precision_e2e_full_shard_mp_only_reduce_offload_false_prefetch_post_fp64_sharded_grad_scaler (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 68837 2022-05-18T05:05:59.2881414Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 68838 2022-05-18T05:06:01.9114674Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:06:01.9141795Z dist init r=0, world=2 2022-05-18T05:06:01.9142282Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:06:01.9146999Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:06:01.9172132Z dist init r=1, world=2 2022-05-18T05:06:01.9176849Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:06:01.9177842Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:06:01.9250466Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:06:02.9726700Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:06:02.9729068Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:06:03.7994274Z ok (4.536s) 2022-05-18T05:06:03.8124307Z test_mixed_precision_e2e_full_shard_mp_only_reduce_offload_false_prefetch_pre_fp32_none (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 68925 2022-05-18T05:06:03.8233559Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 68926 2022-05-18T05:06:06.4329905Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:06:06.4357290Z dist init r=1, world=2 2022-05-18T05:06:06.4362611Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:06:06.4365109Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:06:06.4393542Z dist init r=0, world=2 2022-05-18T05:06:06.4397933Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:06:06.4398868Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:06:06.4465834Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:06:07.4932736Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:06:07.4933282Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:06:08.3349253Z ok (4.535s) 2022-05-18T05:06:08.3483350Z test_mixed_precision_e2e_full_shard_mp_only_reduce_offload_false_prefetch_pre_fp32_sharded_grad_scaler (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 69013 2022-05-18T05:06:08.3594304Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 69014 2022-05-18T05:06:10.9596222Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:06:10.9623650Z dist init r=0, world=2 2022-05-18T05:06:10.9628845Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:06:10.9959779Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:06:10.9988395Z dist init r=1, world=2 2022-05-18T05:06:10.9992966Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:06:10.9993849Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:06:11.0037444Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:06:12.0355552Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:06:12.0356090Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:06:12.8709720Z ok (4.536s) 2022-05-18T05:06:12.8844895Z test_mixed_precision_e2e_full_shard_mp_only_reduce_offload_false_prefetch_pre_fp64_none (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 69101 2022-05-18T05:06:12.8961393Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 69102 2022-05-18T05:06:15.5566705Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:06:15.5594472Z dist init r=0, world=2 2022-05-18T05:06:15.5599747Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:06:15.5643653Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:06:15.5673433Z dist init r=1, world=2 2022-05-18T05:06:15.5678425Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:06:15.5679245Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:06:15.5703487Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:06:16.6191568Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:06:16.6192118Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:06:17.4092097Z ok (4.538s) 2022-05-18T05:06:17.4225799Z test_mixed_precision_e2e_full_shard_mp_only_reduce_offload_false_prefetch_pre_fp64_sharded_grad_scaler (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 69189 2022-05-18T05:06:17.4335425Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 69190 2022-05-18T05:06:20.0365901Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:06:20.0366608Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:06:20.0393616Z dist init r=0, world=2 2022-05-18T05:06:20.0394098Z dist init r=1, world=2 2022-05-18T05:06:20.0398895Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:06:20.0399971Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:06:20.0401338Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:06:20.0402677Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:06:21.0698127Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:06:21.0699169Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:06:21.8449956Z ok (4.436s) 2022-05-18T05:06:21.8581446Z test_mixed_precision_e2e_full_shard_mp_only_reduce_offload_true_prefetch_post_fp32_none (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 69277 2022-05-18T05:06:21.8690492Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 69278 2022-05-18T05:06:24.4637877Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:06:24.4664253Z dist init r=0, world=2 2022-05-18T05:06:24.4669332Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:06:24.4910129Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:06:24.4939209Z dist init r=1, world=2 2022-05-18T05:06:24.4943897Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:06:24.4944767Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:06:24.4976308Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:06:25.5392726Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:06:25.5393269Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:06:26.2804319Z ok (4.435s) 2022-05-18T05:06:26.2933988Z test_mixed_precision_e2e_full_shard_mp_only_reduce_offload_true_prefetch_post_fp32_sharded_grad_scaler (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 69365 2022-05-18T05:06:26.3042823Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 69366 2022-05-18T05:06:28.9009567Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:06:28.9036443Z dist init r=1, world=2 2022-05-18T05:06:28.9041297Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:06:28.9189064Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:06:28.9218106Z dist init r=0, world=2 2022-05-18T05:06:28.9222913Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:06:28.9223722Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:06:28.9246104Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:06:29.9346086Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:06:29.9346863Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:06:30.7156264Z ok (4.435s) 2022-05-18T05:06:30.7286250Z test_mixed_precision_e2e_full_shard_mp_only_reduce_offload_true_prefetch_post_fp64_none (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 69453 2022-05-18T05:06:30.7394951Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 69454 2022-05-18T05:06:33.3990758Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:06:33.4017723Z dist init r=1, world=2 2022-05-18T05:06:33.4022646Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:06:33.4170859Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:06:33.4199496Z dist init r=0, world=2 2022-05-18T05:06:33.4204628Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:06:33.4205713Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:06:33.4227565Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:06:34.4679541Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:06:34.4680074Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:06:35.2505895Z ok (4.535s) 2022-05-18T05:06:35.2637039Z test_mixed_precision_e2e_full_shard_mp_only_reduce_offload_true_prefetch_post_fp64_sharded_grad_scaler (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 69541 2022-05-18T05:06:35.2746309Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 69542 2022-05-18T05:06:37.9031269Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:06:37.9031699Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:06:37.9058593Z dist init r=1, world=2 2022-05-18T05:06:37.9060712Z dist init r=0, world=2 2022-05-18T05:06:37.9063962Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:06:37.9065124Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:06:37.9066110Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:06:37.9067277Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:06:38.9331554Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:06:38.9332089Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:06:39.7859303Z ok (4.535s) 2022-05-18T05:06:39.7988387Z test_mixed_precision_e2e_full_shard_mp_only_reduce_offload_true_prefetch_pre_fp32_none (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 69629 2022-05-18T05:06:39.8096151Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 69630 2022-05-18T05:06:42.3554167Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:06:42.3581129Z dist init r=0, world=2 2022-05-18T05:06:42.3586433Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:06:42.4043370Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:06:42.4073219Z dist init r=1, world=2 2022-05-18T05:06:42.4078227Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:06:42.4079041Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:06:42.4096011Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:06:43.4439407Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:06:43.4439954Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:06:44.2209081Z ok (4.435s) 2022-05-18T05:06:44.2337617Z test_mixed_precision_e2e_full_shard_mp_only_reduce_offload_true_prefetch_pre_fp32_sharded_grad_scaler (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 69717 2022-05-18T05:06:44.2446232Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 69718 2022-05-18T05:06:46.8552068Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:06:46.8579477Z dist init r=1, world=2 2022-05-18T05:06:46.8584518Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:06:46.8760996Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:06:46.8790144Z dist init r=0, world=2 2022-05-18T05:06:46.8795027Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:06:46.8795838Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:06:46.8891117Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:06:47.8983215Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:06:47.8983755Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:06:48.7560153Z ok (4.535s) 2022-05-18T05:06:48.7690127Z test_mixed_precision_e2e_full_shard_mp_only_reduce_offload_true_prefetch_pre_fp64_none (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 69805 2022-05-18T05:06:48.7796992Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 69806 2022-05-18T05:06:51.3853539Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:06:51.3880544Z dist init r=1, world=2 2022-05-18T05:06:51.3885737Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:06:51.4107104Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:06:51.4135890Z dist init r=0, world=2 2022-05-18T05:06:51.4140840Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:06:51.4141667Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:06:51.4192437Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:06:52.4300148Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:06:52.4300672Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:06:53.2910626Z ok (4.535s) 2022-05-18T05:06:53.3044989Z test_mixed_precision_e2e_full_shard_mp_only_reduce_offload_true_prefetch_pre_fp64_sharded_grad_scaler (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 69893 2022-05-18T05:06:53.3155407Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 69894 2022-05-18T05:06:55.9555422Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:06:55.9583709Z dist init r=0, world=2 2022-05-18T05:06:55.9588837Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:06:55.9685189Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:06:55.9714584Z dist init r=1, world=2 2022-05-18T05:06:55.9719775Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:06:55.9720584Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:06:55.9794401Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:06:57.0223709Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:06:57.0224729Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:06:57.8269689Z ok (4.536s) 2022-05-18T05:06:57.8401258Z test_mixed_precision_no_reshard_after_forward (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 69981 2022-05-18T05:06:57.8509488Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 69982 2022-05-18T05:07:00.4463377Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:07:00.4489349Z dist init r=0, world=2 2022-05-18T05:07:00.4494825Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:07:00.4704928Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:07:00.4734245Z dist init r=1, world=2 2022-05-18T05:07:00.4739092Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:07:00.4740175Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:07:00.4801556Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:07:01.5320013Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:07:01.5320642Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:07:02.3623186Z ok (4.535s) 2022-05-18T05:07:02.3642031Z test_mixed_precision_resnet (__main__.TestFSDPMixedPrecisionSharded) 2022-05-18T05:07:02.3643033Z End to end test to ensure mixed precision + auto_wrap works ... skip: no torchvision (0.002s) 2022-05-18T05:07:02.3789789Z test_mp_batchnorm_convert_sync_bn_False (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 70069 2022-05-18T05:07:02.3898925Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 70070 2022-05-18T05:07:05.0143127Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:07:05.0169726Z dist init r=0, world=2 2022-05-18T05:07:05.0175516Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:07:05.0222386Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:07:05.0251737Z dist init r=1, world=2 2022-05-18T05:07:05.0256513Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:07:05.0257379Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:07:05.0278609Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:07:06.0839036Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:07:06.0839576Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:07:07.1019119Z ok (4.737s) 2022-05-18T05:07:07.1166125Z test_mp_batchnorm_convert_sync_bn_True (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 70157 2022-05-18T05:07:07.1273592Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 70158 2022-05-18T05:07:09.7173163Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:07:09.7199242Z dist init r=0, world=2 2022-05-18T05:07:09.7204548Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:07:09.7422874Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:07:09.7451950Z dist init r=1, world=2 2022-05-18T05:07:09.7456926Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:07:09.7457974Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:07:09.7511259Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:07:10.8023326Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:07:10.8023874Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:07:11.5382842Z ok (4.436s) 2022-05-18T05:07:11.5513724Z test_mp_embedding_default (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 70245 2022-05-18T05:07:11.5623077Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 70246 2022-05-18T05:07:14.1548386Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:07:14.1548793Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:07:14.1575653Z dist init r=0, world=2 2022-05-18T05:07:14.1575913Z dist init r=1, world=2 2022-05-18T05:07:14.1581105Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:07:14.1581639Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:07:14.1582617Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:07:14.1585184Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:07:15.1867943Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:07:15.1868743Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:07:15.2171234Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:07:15.2172286Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:07:15.2202797Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:07:15.2203694Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:07:15.9734180Z ok (4.435s) 2022-05-18T05:07:15.9863517Z test_mp_embedding_only_params_and_bufs (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 70333 2022-05-18T05:07:15.9973363Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 70334 2022-05-18T05:07:18.5784262Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:07:18.5810835Z dist init r=1, world=2 2022-05-18T05:07:18.5815687Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:07:18.5898198Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:07:18.5925741Z dist init r=0, world=2 2022-05-18T05:07:18.5930619Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:07:18.5931660Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:07:18.6020566Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:07:19.6209230Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:07:19.6209812Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:07:19.6529189Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:07:19.6529881Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:07:19.6531006Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:07:19.6531640Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:07:20.4085883Z ok (4.435s) 2022-05-18T05:07:20.4215236Z test_mp_embedding_params_and_reduce_diff (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 70421 2022-05-18T05:07:20.4324876Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 70422 2022-05-18T05:07:23.0708373Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:07:23.0736289Z dist init r=1, world=2 2022-05-18T05:07:23.0741740Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:07:23.0825418Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:07:23.0855426Z dist init r=0, world=2 2022-05-18T05:07:23.0861055Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:07:23.0862590Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:07:23.0947253Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:07:24.1408362Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:07:24.1408904Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:07:24.1730196Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:07:24.1731133Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:07:24.1767664Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:07:24.1768304Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:07:25.0455821Z ok (4.637s) 2022-05-18T05:07:25.0590206Z test_mp_embedding_reduce (__main__.TestFSDPMixedPrecisionSharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 70509 2022-05-18T05:07:25.0700856Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 70510 2022-05-18T05:07:27.6669648Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:07:27.6696359Z dist init r=0, world=2 2022-05-18T05:07:27.6701383Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:07:27.6800573Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:07:27.6829830Z dist init r=1, world=2 2022-05-18T05:07:27.6834594Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:07:27.6835787Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:07:27.6906719Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:07:28.7297269Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:07:28.7297799Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:07:28.7610604Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:07:28.7611567Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:07:28.7613271Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:07:28.7613903Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:07:29.8823601Z ok (4.837s) 2022-05-18T05:07:29.8957512Z test_mixed_precision_e2e_full_shard (__main__.TestFSDPMixedPrecisionUnsharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 70597 2022-05-18T05:07:32.4204725Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:07:32.4230677Z dist init r=0, world=1 2022-05-18T05:07:32.4235431Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:07:32.4236757Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2022-05-18T05:07:32.4760659Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:07:33.1036322Z ok (3.221s) 2022-05-18T05:07:33.1167267Z test_mixed_precision_no_reshard_after_forward (__main__.TestFSDPMixedPrecisionUnsharded) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 70641 2022-05-18T05:07:35.6373225Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:07:35.6400058Z dist init r=0, world=1 2022-05-18T05:07:35.6404765Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:07:35.6405824Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2022-05-18T05:07:35.6930967Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:07:36.3254548Z ok (3.222s) 2022-05-18T05:07:36.3254922Z 2022-05-18T05:07:36.3255442Z ---------------------------------------------------------------------- 2022-05-18T05:07:36.3255800Z Ran 90 tests in 399.377s 2022-05-18T05:07:36.3255970Z 2022-05-18T05:07:36.3256087Z OK (skipped=1) 2022-05-18T05:07:36.3256225Z 2022-05-18T05:07:36.3256354Z Generating XML reports... 2022-05-18T05:07:36.3407741Z Generated XML report: test-reports/python-unittest/distributed.fsdp.test_fsdp_mixed_precision/TEST-TestFSDPMixedPrecisionSharded-20220518050056.xml 2022-05-18T05:07:36.3411781Z Generated XML report: test-reports/python-unittest/distributed.fsdp.test_fsdp_mixed_precision/TEST-TestFSDPMixedPrecisionUnsharded-20220518050056.xml 2022-05-18T05:07:36.6184982Z Running distributed/test_c10d_gloo ... [2022-05-18 05:07:36.618019] 2022-05-18T05:07:36.6185720Z Executing ['/opt/conda/bin/python', 'distributed/test_c10d_gloo.py', '-v', '--subprocess', '--import-slow-tests', '--import-disabled-tests'] ... [2022-05-18 05:07:36.618119] 2022-05-18T05:07:37.5216612Z , <__main__.CommTest testMethod=test_broadcast_coalesced_gloo_cuda>, <__main__.CommTest testMethod=test_gloo_barrier_device_ids>, <__main__.CommTest testMethod=test_gloo_warn_not_in_group>, <__main__.CommTest testMethod=test_sequence_num_incremented_gloo_default>, <__main__.CommTest testMethod=test_sequence_num_incremented_gloo_subgroup>, <__main__.CommTest testMethod=test_sequence_num_set_default_pg_gloo>, <__main__.CommTest testMethod=test_sequence_num_set_gloo_new_group>]> 2022-05-18T05:07:37.5217717Z test_broadcast_coalesced_gloo_cpu (__main__.CommTest) 2022-05-18T05:07:37.5218111Z test_broadcast_coalesced_gloo_cuda (__main__.CommTest) 2022-05-18T05:07:37.5218465Z test_gloo_barrier_device_ids (__main__.CommTest) 2022-05-18T05:07:37.5218779Z test_gloo_warn_not_in_group (__main__.CommTest) 2022-05-18T05:07:37.5219139Z test_sequence_num_incremented_gloo_default (__main__.CommTest) 2022-05-18T05:07:37.5219523Z test_sequence_num_incremented_gloo_subgroup (__main__.CommTest) 2022-05-18T05:07:37.5219883Z test_sequence_num_set_default_pg_gloo (__main__.CommTest) 2022-05-18T05:07:37.5220238Z test_sequence_num_set_gloo_new_group (__main__.CommTest) 2022-05-18T05:07:37.5228798Z , <__main__.DistributedDataParallelTest testMethod=test_ddp_checkpointing_dynamic_weight_sharing>, <__main__.DistributedDataParallelTest testMethod=test_ddp_checkpointing_once_use_reentrant_False>, <__main__.DistributedDataParallelTest testMethod=test_ddp_checkpointing_once_use_reentrant_True>, <__main__.DistributedDataParallelTest testMethod=test_ddp_checkpointing_twice_static_graph_use_reentrant_False>, <__main__.DistributedDataParallelTest testMethod=test_ddp_checkpointing_twice_static_graph_use_reentrant_True>, <__main__.DistributedDataParallelTest testMethod=test_ddp_checkpointing_twice_use_reentrant_False>, <__main__.DistributedDataParallelTest testMethod=test_ddp_checkpointing_twice_use_reentrant_True>, <__main__.DistributedDataParallelTest testMethod=test_ddp_checkpointing_twice_weight_sharing>, <__main__.DistributedDataParallelTest testMethod=test_ddp_checkpointing_unused_params_use_reentrant_False>, <__main__.DistributedDataParallelTest testMethod=test_ddp_checkpointing_unused_params_use_reentrant_True>, <__main__.DistributedDataParallelTest testMethod=test_ddp_checkpointing_weight_sharing_use_reentrant_False>, <__main__.DistributedDataParallelTest testMethod=test_ddp_checkpointing_weight_sharing_use_reentrant_True>, <__main__.DistributedDataParallelTest testMethod=test_ddp_comm_hook_future_passing_cpu>, <__main__.DistributedDataParallelTest testMethod=test_ddp_comm_hook_future_passing_gpu_gloo>, <__main__.DistributedDataParallelTest testMethod=test_ddp_comm_hook_register_just_once>, <__main__.DistributedDataParallelTest testMethod=test_ddp_comm_hook_sparse_gradients>, <__main__.DistributedDataParallelTest testMethod=test_ddp_invalid_comm_hook_init>, <__main__.DistributedDataParallelTest testMethod=test_ddp_invalid_comm_hook_return_type>, <__main__.DistributedDataParallelTest testMethod=test_find_unused_parameters_when_unused_parameters_empty>, <__main__.DistributedDataParallelTest testMethod=test_global_local_unused_params_grad>, <__main__.DistributedDataParallelTest testMethod=test_global_local_unused_params_grad_with_grad_is_view>, <__main__.DistributedDataParallelTest testMethod=test_global_local_unused_params_grad_with_static_graph>, <__main__.DistributedDataParallelTest testMethod=test_gloo_backend_1gpu_module_device_ids_integer_list>, <__main__.DistributedDataParallelTest testMethod=test_gloo_backend_1gpu_module_device_ids_torch_device_list>, <__main__.DistributedDataParallelTest testMethod=test_gloo_backend_2gpu_module>, <__main__.DistributedDataParallelTest testMethod=test_gloo_backend_4gpu_module>, <__main__.DistributedDataParallelTest testMethod=test_gloo_backend_cpu_module>, <__main__.DistributedDataParallelTest testMethod=test_gloo_backend_cpu_module_grad_is_view>, <__main__.DistributedDataParallelTest testMethod=test_ignored_output>, <__main__.DistributedDataParallelTest testMethod=test_ignored_output_with_unused_parameters>, <__main__.DistributedDataParallelTest testMethod=test_invalid_powerSGD_state>, <__main__.DistributedDataParallelTest testMethod=test_save_load_checkpoint>, <__main__.DistributedDataParallelTest testMethod=test_sparse_gradients>, <__main__.DistributedDataParallelTest testMethod=test_sparse_gradients_grad_is_view>, <__main__.DistributedDataParallelTest testMethod=test_sync_batch_norm_empty_input>, <__main__.DistributedDataParallelTest testMethod=test_sync_batch_norm_only_empty_input>]> 2022-05-18T05:07:37.5236657Z test_ddp_checkpointing_dynamic_module (__main__.DistributedDataParallelTest) 2022-05-18T05:07:37.5237155Z test_ddp_checkpointing_dynamic_weight_sharing (__main__.DistributedDataParallelTest) 2022-05-18T05:07:37.5237650Z test_ddp_checkpointing_once_use_reentrant_False (__main__.DistributedDataParallelTest) 2022-05-18T05:07:37.5238127Z test_ddp_checkpointing_once_use_reentrant_True (__main__.DistributedDataParallelTest) 2022-05-18T05:07:37.5238635Z test_ddp_checkpointing_twice_static_graph_use_reentrant_False (__main__.DistributedDataParallelTest) 2022-05-18T05:07:37.5239163Z test_ddp_checkpointing_twice_static_graph_use_reentrant_True (__main__.DistributedDataParallelTest) 2022-05-18T05:07:37.5239669Z test_ddp_checkpointing_twice_use_reentrant_False (__main__.DistributedDataParallelTest) 2022-05-18T05:07:37.5240136Z test_ddp_checkpointing_twice_use_reentrant_True (__main__.DistributedDataParallelTest) 2022-05-18T05:07:37.5240706Z test_ddp_checkpointing_twice_weight_sharing (__main__.DistributedDataParallelTest) 2022-05-18T05:07:37.5241207Z test_ddp_checkpointing_unused_params_use_reentrant_False (__main__.DistributedDataParallelTest) 2022-05-18T05:07:37.5241700Z test_ddp_checkpointing_unused_params_use_reentrant_True (__main__.DistributedDataParallelTest) 2022-05-18T05:07:37.5242264Z test_ddp_checkpointing_weight_sharing_use_reentrant_False (__main__.DistributedDataParallelTest) 2022-05-18T05:07:37.5242785Z test_ddp_checkpointing_weight_sharing_use_reentrant_True (__main__.DistributedDataParallelTest) 2022-05-18T05:07:37.5243316Z test_ddp_comm_hook_future_passing_cpu (__main__.DistributedDataParallelTest) 2022-05-18T05:07:37.5243779Z test_ddp_comm_hook_future_passing_gpu_gloo (__main__.DistributedDataParallelTest) 2022-05-18T05:07:37.5244217Z test_ddp_comm_hook_register_just_once (__main__.DistributedDataParallelTest) 2022-05-18T05:07:37.5244662Z test_ddp_comm_hook_sparse_gradients (__main__.DistributedDataParallelTest) 2022-05-18T05:07:37.5245108Z test_ddp_invalid_comm_hook_init (__main__.DistributedDataParallelTest) 2022-05-18T05:07:37.5245535Z test_ddp_invalid_comm_hook_return_type (__main__.DistributedDataParallelTest) 2022-05-18T05:07:37.5246021Z test_find_unused_parameters_when_unused_parameters_empty (__main__.DistributedDataParallelTest) 2022-05-18T05:07:37.5246503Z test_global_local_unused_params_grad (__main__.DistributedDataParallelTest) 2022-05-18T05:07:37.5246976Z test_global_local_unused_params_grad_with_grad_is_view (__main__.DistributedDataParallelTest) 2022-05-18T05:07:37.5247455Z test_global_local_unused_params_grad_with_static_graph (__main__.DistributedDataParallelTest) 2022-05-18T05:07:37.5247951Z test_gloo_backend_1gpu_module_device_ids_integer_list (__main__.DistributedDataParallelTest) 2022-05-18T05:07:37.5248451Z test_gloo_backend_1gpu_module_device_ids_torch_device_list (__main__.DistributedDataParallelTest) 2022-05-18T05:07:37.5248903Z test_gloo_backend_2gpu_module (__main__.DistributedDataParallelTest) 2022-05-18T05:07:37.5249340Z test_gloo_backend_4gpu_module (__main__.DistributedDataParallelTest) 2022-05-18T05:07:37.5249769Z test_gloo_backend_cpu_module (__main__.DistributedDataParallelTest) 2022-05-18T05:07:37.5250213Z test_gloo_backend_cpu_module_grad_is_view (__main__.DistributedDataParallelTest) 2022-05-18T05:07:37.5251740Z test_ignored_output (__main__.DistributedDataParallelTest) 2022-05-18T05:07:37.5252188Z test_ignored_output_with_unused_parameters (__main__.DistributedDataParallelTest) 2022-05-18T05:07:37.5252638Z test_invalid_powerSGD_state (__main__.DistributedDataParallelTest) 2022-05-18T05:07:37.5253042Z test_save_load_checkpoint (__main__.DistributedDataParallelTest) 2022-05-18T05:07:37.5253456Z test_sparse_gradients (__main__.DistributedDataParallelTest) 2022-05-18T05:07:37.5253881Z test_sparse_gradients_grad_is_view (__main__.DistributedDataParallelTest) 2022-05-18T05:07:37.5254307Z test_sync_batch_norm_empty_input (__main__.DistributedDataParallelTest) 2022-05-18T05:07:37.5254757Z test_sync_batch_norm_only_empty_input (__main__.DistributedDataParallelTest) 2022-05-18T05:07:37.5255138Z 2022-05-18T05:07:37.5260372Z , <__main__.ProcessGroupGlooTest testMethod=test_allgather_basics_cuda>, <__main__.ProcessGroupGlooTest testMethod=test_allgather_checks>, <__main__.ProcessGroupGlooTest testMethod=test_allgather_coalesced_async>, <__main__.ProcessGroupGlooTest testMethod=test_allgather_coalesced_checks>, <__main__.ProcessGroupGlooTest testMethod=test_allgather_noncontiguous_input>, <__main__.ProcessGroupGlooTest testMethod=test_allgather_stress>, <__main__.ProcessGroupGlooTest testMethod=test_allgather_stress_cuda>, <__main__.ProcessGroupGlooTest testMethod=test_allreduce_basics>, <__main__.ProcessGroupGlooTest testMethod=test_allreduce_basics_cuda>, <__main__.ProcessGroupGlooTest testMethod=test_allreduce_basics_cuda_using_work_api>, <__main__.ProcessGroupGlooTest testMethod=test_allreduce_basics_using_work_api>, <__main__.ProcessGroupGlooTest testMethod=test_allreduce_checks>, <__main__.ProcessGroupGlooTest testMethod=test_allreduce_coalesced_async>, <__main__.ProcessGroupGlooTest testMethod=test_allreduce_coalesced_basics>, <__main__.ProcessGroupGlooTest testMethod=test_allreduce_coalesced_checks>, <__main__.ProcessGroupGlooTest testMethod=test_allreduce_coalesced_checks_cuda>, <__main__.ProcessGroupGlooTest testMethod=test_allreduce_coalesced_stress>, <__main__.ProcessGroupGlooTest testMethod=test_allreduce_stress>, <__main__.ProcessGroupGlooTest testMethod=test_allreduce_stress_cuda>, <__main__.ProcessGroupGlooTest testMethod=test_barrier_implies_wait>, <__main__.ProcessGroupGlooTest testMethod=test_broadcast_basics>, <__main__.ProcessGroupGlooTest testMethod=test_broadcast_basics_cuda>, <__main__.ProcessGroupGlooTest testMethod=test_broadcast_checks>, <__main__.ProcessGroupGlooTest testMethod=test_broadcast_stress>, <__main__.ProcessGroupGlooTest testMethod=test_broadcast_stress_cuda>, <__main__.ProcessGroupGlooTest testMethod=test_empty_tensors>, <__main__.ProcessGroupGlooTest testMethod=test_gather_basics>, <__main__.ProcessGroupGlooTest testMethod=test_gather_basics_cuda>, <__main__.ProcessGroupGlooTest testMethod=test_gather_checks>, <__main__.ProcessGroupGlooTest testMethod=test_gather_noncontiguous_input>, <__main__.ProcessGroupGlooTest testMethod=test_gather_stress>, <__main__.ProcessGroupGlooTest testMethod=test_gather_stress_cuda>, <__main__.ProcessGroupGlooTest testMethod=test_multi_device_constructor>, <__main__.ProcessGroupGlooTest testMethod=test_reduce_basics>, <__main__.ProcessGroupGlooTest testMethod=test_reduce_basics_cuda>, <__main__.ProcessGroupGlooTest testMethod=test_reduce_checks>, <__main__.ProcessGroupGlooTest testMethod=test_reduce_stress>, <__main__.ProcessGroupGlooTest testMethod=test_reduce_stress_cuda>, <__main__.ProcessGroupGlooTest testMethod=test_round_robin>, <__main__.ProcessGroupGlooTest testMethod=test_round_robin_create_destroy>, <__main__.ProcessGroupGlooTest testMethod=test_scatter_basics>, <__main__.ProcessGroupGlooTest testMethod=test_scatter_basics_cuda>, <__main__.ProcessGroupGlooTest testMethod=test_scatter_checks>, <__main__.ProcessGroupGlooTest testMethod=test_scatter_stress>, <__main__.ProcessGroupGlooTest testMethod=test_scatter_stress_cuda>, <__main__.ProcessGroupGlooTest testMethod=test_send_recv_all_to_all>, <__main__.ProcessGroupGlooTest testMethod=test_sparse_allreduce_basics>, <__main__.ProcessGroupGlooTest testMethod=test_sparse_allreduce_basics_cuda>, <__main__.ProcessGroupGlooTest testMethod=test_sparse_allreduce_checks>]> 2022-05-18T05:07:37.5265512Z test_allgather_basics (__main__.ProcessGroupGlooTest) 2022-05-18T05:07:37.5265903Z test_allgather_basics_cuda (__main__.ProcessGroupGlooTest) 2022-05-18T05:07:37.5266283Z test_allgather_checks (__main__.ProcessGroupGlooTest) 2022-05-18T05:07:37.5266650Z test_allgather_coalesced_async (__main__.ProcessGroupGlooTest) 2022-05-18T05:07:37.5267058Z test_allgather_coalesced_checks (__main__.ProcessGroupGlooTest) 2022-05-18T05:07:37.5267468Z test_allgather_noncontiguous_input (__main__.ProcessGroupGlooTest) 2022-05-18T05:07:37.5267845Z test_allgather_stress (__main__.ProcessGroupGlooTest) 2022-05-18T05:07:37.5268223Z test_allgather_stress_cuda (__main__.ProcessGroupGlooTest) 2022-05-18T05:07:37.5268597Z test_allreduce_basics (__main__.ProcessGroupGlooTest) 2022-05-18T05:07:37.5268958Z test_allreduce_basics_cuda (__main__.ProcessGroupGlooTest) 2022-05-18T05:07:37.5269368Z test_allreduce_basics_cuda_using_work_api (__main__.ProcessGroupGlooTest) 2022-05-18T05:07:37.5269794Z test_allreduce_basics_using_work_api (__main__.ProcessGroupGlooTest) 2022-05-18T05:07:37.5270181Z test_allreduce_checks (__main__.ProcessGroupGlooTest) 2022-05-18T05:07:37.5270548Z test_allreduce_coalesced_async (__main__.ProcessGroupGlooTest) 2022-05-18T05:07:37.5270945Z test_allreduce_coalesced_basics (__main__.ProcessGroupGlooTest) 2022-05-18T05:07:37.5271343Z test_allreduce_coalesced_checks (__main__.ProcessGroupGlooTest) 2022-05-18T05:07:37.5271858Z test_allreduce_coalesced_checks_cuda (__main__.ProcessGroupGlooTest) 2022-05-18T05:07:37.5272266Z test_allreduce_coalesced_stress (__main__.ProcessGroupGlooTest) 2022-05-18T05:07:37.5272647Z test_allreduce_stress (__main__.ProcessGroupGlooTest) 2022-05-18T05:07:37.5273006Z test_allreduce_stress_cuda (__main__.ProcessGroupGlooTest) 2022-05-18T05:07:37.5273446Z test_barrier_implies_wait (__main__.ProcessGroupGlooTest) 2022-05-18T05:07:37.5273832Z test_broadcast_basics (__main__.ProcessGroupGlooTest) 2022-05-18T05:07:37.5274206Z test_broadcast_basics_cuda (__main__.ProcessGroupGlooTest) 2022-05-18T05:07:37.5274564Z test_broadcast_checks (__main__.ProcessGroupGlooTest) 2022-05-18T05:07:37.5274933Z test_broadcast_stress (__main__.ProcessGroupGlooTest) 2022-05-18T05:07:37.5275307Z test_broadcast_stress_cuda (__main__.ProcessGroupGlooTest) 2022-05-18T05:07:37.5275658Z test_empty_tensors (__main__.ProcessGroupGlooTest) 2022-05-18T05:07:37.5276022Z test_gather_basics (__main__.ProcessGroupGlooTest) 2022-05-18T05:07:37.5276390Z test_gather_basics_cuda (__main__.ProcessGroupGlooTest) 2022-05-18T05:07:37.5276739Z test_gather_checks (__main__.ProcessGroupGlooTest) 2022-05-18T05:07:37.5277119Z test_gather_noncontiguous_input (__main__.ProcessGroupGlooTest) 2022-05-18T05:07:37.5277498Z test_gather_stress (__main__.ProcessGroupGlooTest) 2022-05-18T05:07:37.5277852Z test_gather_stress_cuda (__main__.ProcessGroupGlooTest) 2022-05-18T05:07:37.5278239Z test_multi_device_constructor (__main__.ProcessGroupGlooTest) 2022-05-18T05:07:37.5278614Z test_reduce_basics (__main__.ProcessGroupGlooTest) 2022-05-18T05:07:37.5278979Z test_reduce_basics_cuda (__main__.ProcessGroupGlooTest) 2022-05-18T05:07:37.5279323Z test_reduce_checks (__main__.ProcessGroupGlooTest) 2022-05-18T05:07:37.5279677Z test_reduce_stress (__main__.ProcessGroupGlooTest) 2022-05-18T05:07:37.5280040Z test_reduce_stress_cuda (__main__.ProcessGroupGlooTest) 2022-05-18T05:07:37.5280381Z test_round_robin (__main__.ProcessGroupGlooTest) 2022-05-18T05:07:37.5280762Z test_round_robin_create_destroy (__main__.ProcessGroupGlooTest) 2022-05-18T05:07:37.5281141Z test_scatter_basics (__main__.ProcessGroupGlooTest) 2022-05-18T05:07:37.5281494Z test_scatter_basics_cuda (__main__.ProcessGroupGlooTest) 2022-05-18T05:07:37.5281860Z test_scatter_checks (__main__.ProcessGroupGlooTest) 2022-05-18T05:07:37.5282219Z test_scatter_stress (__main__.ProcessGroupGlooTest) 2022-05-18T05:07:37.5282591Z test_scatter_stress_cuda (__main__.ProcessGroupGlooTest) 2022-05-18T05:07:37.5282954Z test_send_recv_all_to_all (__main__.ProcessGroupGlooTest) 2022-05-18T05:07:37.5283336Z test_sparse_allreduce_basics (__main__.ProcessGroupGlooTest) 2022-05-18T05:07:37.5283733Z test_sparse_allreduce_basics_cuda (__main__.ProcessGroupGlooTest) 2022-05-18T05:07:37.5284116Z test_sparse_allreduce_checks (__main__.ProcessGroupGlooTest) 2022-05-18T05:07:37.5284975Z , <__main__.ReducerTest testMethod=test_forward_backward_optimizer>, <__main__.ReducerTest testMethod=test_forward_backward_unused_parameters>, <__main__.ReducerTest testMethod=test_multi_dtype_multi_bucket>, <__main__.ReducerTest testMethod=test_multi_dtype_single_bucket>, <__main__.ReducerTest testMethod=test_single_dtype_single_bucket>]> 2022-05-18T05:07:37.5285772Z test_forward_backward (__main__.ReducerTest) 2022-05-18T05:07:37.5286123Z test_forward_backward_optimizer (__main__.ReducerTest) 2022-05-18T05:07:37.5286480Z test_forward_backward_unused_parameters (__main__.ReducerTest) 2022-05-18T05:07:37.5286851Z test_multi_dtype_multi_bucket (__main__.ReducerTest) 2022-05-18T05:07:37.5287203Z test_multi_dtype_single_bucket (__main__.ReducerTest) 2022-05-18T05:07:37.5287555Z test_single_dtype_single_bucket (__main__.ReducerTest) 2022-05-18T05:07:37.5287964Z ]> 2022-05-18T05:07:37.5288377Z test_logging_init (__main__.RendezvousEnvTest) 2022-05-18T05:07:37.5288777Z 2022-05-18T05:07:37.5289181Z ]> 2022-05-18T05:07:37.5289605Z test_default_store_timeout_gloo (__main__.TimeoutTest) 2022-05-18T05:07:38.4232220Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T05:07:38.4247018Z 2022-05-18T05:07:38.4247633Z Running tests... 2022-05-18T05:07:38.4248142Z ---------------------------------------------------------------------- 2022-05-18T05:07:40.0839832Z test_broadcast_coalesced_gloo_cpu (__main__.CommTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:07:40.1201646Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 70757 2022-05-18T05:07:40.1311337Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 70758 2022-05-18T05:07:41.0160388Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:07:41.0561260Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:07:41.3355482Z ok (2.911s) 2022-05-18T05:07:41.3355916Z 2022-05-18T05:07:41.3356698Z ---------------------------------------------------------------------- 2022-05-18T05:07:41.3357126Z Ran 1 test in 2.911s 2022-05-18T05:07:41.3357298Z 2022-05-18T05:07:41.3357397Z OK 2022-05-18T05:07:41.3357538Z 2022-05-18T05:07:41.3357677Z Generating XML reports... 2022-05-18T05:07:41.3401637Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CommTest-20220518050738.xml 2022-05-18T05:07:42.5119249Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T05:07:42.5135498Z 2022-05-18T05:07:42.5136006Z Running tests... 2022-05-18T05:07:42.5136517Z ---------------------------------------------------------------------- 2022-05-18T05:07:44.1760013Z test_broadcast_coalesced_gloo_cuda (__main__.CommTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:07:44.2120258Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 70872 2022-05-18T05:07:44.2230533Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 70873 2022-05-18T05:07:45.1476013Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:07:45.1827909Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:07:46.8307740Z ok (4.317s) 2022-05-18T05:07:46.8307975Z 2022-05-18T05:07:46.8308384Z ---------------------------------------------------------------------- 2022-05-18T05:07:46.8308735Z Ran 1 test in 4.317s 2022-05-18T05:07:46.8308908Z 2022-05-18T05:07:46.8308992Z OK 2022-05-18T05:07:46.8309133Z 2022-05-18T05:07:46.8309268Z Generating XML reports... 2022-05-18T05:07:46.8351594Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CommTest-20220518050742.xml 2022-05-18T05:07:47.9964740Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T05:07:47.9979721Z 2022-05-18T05:07:47.9980007Z Running tests... 2022-05-18T05:07:47.9980464Z ---------------------------------------------------------------------- 2022-05-18T05:07:49.6341161Z test_gloo_barrier_device_ids (__main__.CommTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:07:49.6697412Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 70989 2022-05-18T05:07:49.6806352Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 70990 2022-05-18T05:07:50.6074635Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:07:50.6091650Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:07:50.6284955Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:07:50.6285475Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:07:50.6286534Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:07:50.6287393Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:07:50.7846243Z ok (2.786s) 2022-05-18T05:07:50.7846763Z 2022-05-18T05:07:50.7847168Z ---------------------------------------------------------------------- 2022-05-18T05:07:50.7847523Z Ran 1 test in 2.787s 2022-05-18T05:07:50.7847693Z 2022-05-18T05:07:50.7847790Z OK 2022-05-18T05:07:50.7847908Z 2022-05-18T05:07:50.7848051Z Generating XML reports... 2022-05-18T05:07:50.7892125Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CommTest-20220518050747.xml 2022-05-18T05:07:51.9543253Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T05:07:51.9558877Z 2022-05-18T05:07:51.9559002Z Running tests... 2022-05-18T05:07:51.9559768Z ---------------------------------------------------------------------- 2022-05-18T05:07:53.5968481Z test_gloo_warn_not_in_group (__main__.CommTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:07:53.6320828Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 71104 2022-05-18T05:07:53.6429584Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 71105 2022-05-18T05:07:54.5787444Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:07:54.5981505Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:07:54.6099791Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:07:54.6100324Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:07:54.6101128Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:07:54.6101829Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:07:54.6106869Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T05:07:54.6205210Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T05:07:54.6205890Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T05:07:54.6209864Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T05:07:56.2505940Z ok (4.294s) 2022-05-18T05:07:56.2506259Z 2022-05-18T05:07:56.2506817Z ---------------------------------------------------------------------- 2022-05-18T05:07:56.2507197Z Ran 1 test in 4.295s 2022-05-18T05:07:56.2507368Z 2022-05-18T05:07:56.2507464Z OK 2022-05-18T05:07:56.2507600Z 2022-05-18T05:07:56.2507741Z Generating XML reports... 2022-05-18T05:07:56.2551318Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CommTest-20220518050751.xml 2022-05-18T05:07:57.4238141Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T05:07:57.4253818Z 2022-05-18T05:07:57.4253974Z Running tests... 2022-05-18T05:07:57.4254434Z ---------------------------------------------------------------------- 2022-05-18T05:07:59.0833319Z test_sequence_num_incremented_gloo_default (__main__.CommTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:07:59.1194576Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 71224 2022-05-18T05:07:59.1304691Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 71225 2022-05-18T05:08:00.0181645Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:08:00.0288548Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:08:00.0499630Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:08:00.0500154Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:08:00.0501123Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:08:00.0501878Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:08:00.0608580Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T05:08:00.0609096Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T05:08:00.0609790Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T05:08:00.0610937Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T05:08:01.7380624Z ok (4.312s) 2022-05-18T05:08:01.7380851Z 2022-05-18T05:08:01.7381274Z ---------------------------------------------------------------------- 2022-05-18T05:08:01.7381621Z Ran 1 test in 4.313s 2022-05-18T05:08:01.7381770Z 2022-05-18T05:08:01.7381874Z OK 2022-05-18T05:08:01.7382009Z 2022-05-18T05:08:01.7382145Z Generating XML reports... 2022-05-18T05:08:01.7424921Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CommTest-20220518050757.xml 2022-05-18T05:08:02.9354046Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T05:08:02.9370862Z 2022-05-18T05:08:02.9371153Z Running tests... 2022-05-18T05:08:02.9371629Z ---------------------------------------------------------------------- 2022-05-18T05:08:04.5982193Z test_sequence_num_incremented_gloo_subgroup (__main__.CommTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:08:04.6343044Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 71347 2022-05-18T05:08:04.6451883Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 71348 2022-05-18T05:08:05.5778876Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:08:05.5866392Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:08:05.7493985Z skip: Need at least 4 CUDA devices (2.812s) 2022-05-18T05:08:05.7494231Z 2022-05-18T05:08:05.7494649Z ---------------------------------------------------------------------- 2022-05-18T05:08:05.7494997Z Ran 1 test in 2.812s 2022-05-18T05:08:05.7495164Z 2022-05-18T05:08:05.7495278Z OK (skipped=1) 2022-05-18T05:08:05.7495418Z 2022-05-18T05:08:05.7495568Z Generating XML reports... 2022-05-18T05:08:05.7550976Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CommTest-20220518050802.xml 2022-05-18T05:08:06.9123900Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T05:08:06.9139336Z 2022-05-18T05:08:06.9139499Z Running tests... 2022-05-18T05:08:06.9139973Z ---------------------------------------------------------------------- 2022-05-18T05:08:08.5563352Z test_sequence_num_set_default_pg_gloo (__main__.CommTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:08:08.5923907Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 71456 2022-05-18T05:08:08.6034000Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 71457 2022-05-18T05:08:09.4996386Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:08:09.5239081Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:08:09.5348970Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:08:09.5349492Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:08:09.5350420Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:08:09.5351145Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:08:09.7077286Z ok (2.793s) 2022-05-18T05:08:09.7077519Z 2022-05-18T05:08:09.7077889Z ---------------------------------------------------------------------- 2022-05-18T05:08:09.7078233Z Ran 1 test in 2.794s 2022-05-18T05:08:09.7078400Z 2022-05-18T05:08:09.7078498Z OK 2022-05-18T05:08:09.7078636Z 2022-05-18T05:08:09.7079881Z Generating XML reports... 2022-05-18T05:08:09.7123332Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CommTest-20220518050806.xml 2022-05-18T05:08:10.8609502Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T05:08:10.8624169Z 2022-05-18T05:08:10.8624518Z Running tests... 2022-05-18T05:08:10.8624981Z ---------------------------------------------------------------------- 2022-05-18T05:08:12.4920611Z test_sequence_num_set_gloo_new_group (__main__.CommTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:08:12.5276891Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 71571 2022-05-18T05:08:12.5386780Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 71572 2022-05-18T05:08:13.5028376Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:08:13.5179851Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:08:13.5338658Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:08:13.5339216Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:08:13.5339995Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:08:13.5340688Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:08:13.5548364Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 0 2022-05-18T05:08:13.5549372Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:2 to store for rank: 1 2022-05-18T05:08:13.5550700Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T05:08:13.5552037Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:2 with 2 nodes. 2022-05-18T05:08:13.7430930Z ok (2.880s) 2022-05-18T05:08:13.7431165Z 2022-05-18T05:08:13.7431570Z ---------------------------------------------------------------------- 2022-05-18T05:08:13.7431915Z Ran 1 test in 2.881s 2022-05-18T05:08:13.7432063Z 2022-05-18T05:08:13.7432163Z OK 2022-05-18T05:08:13.7432303Z 2022-05-18T05:08:13.7432438Z Generating XML reports... 2022-05-18T05:08:13.7474822Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CommTest-20220518050810.xml 2022-05-18T05:08:14.9123498Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T05:08:14.9138872Z 2022-05-18T05:08:14.9139274Z Running tests... 2022-05-18T05:08:14.9139775Z ---------------------------------------------------------------------- 2022-05-18T05:08:14.9147783Z test_ddp_checkpointing_dynamic_module (__main__.DistributedDataParallelTest) 2022-05-18T05:08:16.5661043Z Dynamic module can be checkpointed, multiple times, with non-reentrant ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:08:16.6021539Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 71692 2022-05-18T05:08:16.6132118Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 71693 2022-05-18T05:08:17.5030768Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:08:17.5165667Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:08:18.8607084Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpka_ro404 2022-05-18T05:08:18.8607696Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpka_ro404/_remote_module_non_scriptable.py 2022-05-18T05:08:18.9056197Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpeya_6x7r 2022-05-18T05:08:18.9058143Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpeya_6x7r/_remote_module_non_scriptable.py 2022-05-18T05:08:19.6230299Z ok (4.709s) 2022-05-18T05:08:19.6230541Z 2022-05-18T05:08:19.6230915Z ---------------------------------------------------------------------- 2022-05-18T05:08:19.6231271Z Ran 1 test in 4.709s 2022-05-18T05:08:19.6231439Z 2022-05-18T05:08:19.6231540Z OK 2022-05-18T05:08:19.6231678Z 2022-05-18T05:08:19.6231811Z Generating XML reports... 2022-05-18T05:08:19.6275497Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518050814.xml 2022-05-18T05:08:20.8225420Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T05:08:20.8239715Z 2022-05-18T05:08:20.8240241Z Running tests... 2022-05-18T05:08:20.8240886Z ---------------------------------------------------------------------- 2022-05-18T05:08:20.8249418Z test_ddp_checkpointing_dynamic_weight_sharing (__main__.DistributedDataParallelTest) 2022-05-18T05:08:22.5001838Z Dynamic module can be checkpointed multiple times with weight sharing ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:08:22.5356329Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 71813 2022-05-18T05:08:22.5465681Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 71814 2022-05-18T05:08:23.4517899Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:08:23.4540357Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:08:24.7988352Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpwfmggk32 2022-05-18T05:08:24.7988982Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpwfmggk32/_remote_module_non_scriptable.py 2022-05-18T05:08:24.8286841Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmphlxwkpa2 2022-05-18T05:08:24.8289123Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmphlxwkpa2/_remote_module_non_scriptable.py 2022-05-18T05:08:25.4548657Z ok (4.631s) 2022-05-18T05:08:25.4548893Z 2022-05-18T05:08:25.4549280Z ---------------------------------------------------------------------- 2022-05-18T05:08:25.4549634Z Ran 1 test in 4.631s 2022-05-18T05:08:25.4549805Z 2022-05-18T05:08:25.4549907Z OK 2022-05-18T05:08:25.4550050Z 2022-05-18T05:08:25.4550186Z Generating XML reports... 2022-05-18T05:08:25.4593704Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518050820.xml 2022-05-18T05:08:26.6554295Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T05:08:26.6568522Z 2022-05-18T05:08:26.6568987Z Running tests... 2022-05-18T05:08:26.6569477Z ---------------------------------------------------------------------- 2022-05-18T05:08:26.6580838Z test_ddp_checkpointing_once_use_reentrant_False (__main__.DistributedDataParallelTest) 2022-05-18T05:08:28.3201720Z DDP works as expected when layer is checkpointed only once. ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:08:28.3564305Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 71934 2022-05-18T05:08:28.3674816Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 71935 2022-05-18T05:08:29.2662849Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:08:29.2664118Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:08:30.6144544Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpfzphhkkr 2022-05-18T05:08:30.6145185Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpfzphhkkr/_remote_module_non_scriptable.py 2022-05-18T05:08:30.6405028Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp7dfkss0k 2022-05-18T05:08:30.6407619Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp7dfkss0k/_remote_module_non_scriptable.py 2022-05-18T05:08:30.9552374Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:08:30.9552910Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:08:30.9879286Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:08:30.9879784Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:08:31.0046483Z /opt/conda/lib/python3.7/site-packages/torch/nn/parallel/distributed.py:1737: UserWarning: You passed find_unused_parameters=true to DistributedDataParallel, `_set_static_graph` will detect unused parameters automatically, so you do not need to set find_unused_parameters=true, just be sure these unused parameters will not change during training loop while calling `_set_static_graph`. 2022-05-18T05:08:31.0047347Z "You passed find_unused_parameters=true to DistributedDataParallel, " 2022-05-18T05:08:31.0048502Z /opt/conda/lib/python3.7/site-packages/torch/nn/parallel/distributed.py:1737: UserWarning: You passed find_unused_parameters=true to DistributedDataParallel, `_set_static_graph` will detect unused parameters automatically, so you do not need to set find_unused_parameters=true, just be sure these unused parameters will not change during training loop while calling `_set_static_graph`. 2022-05-18T05:08:31.0049340Z "You passed find_unused_parameters=true to DistributedDataParallel, " 2022-05-18T05:08:31.0169282Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:08:31.0169780Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:08:31.0403761Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:08:31.0404263Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:08:31.0727245Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:08:31.0727729Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:08:31.1010882Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:08:31.1011378Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:08:31.4765658Z ok (4.819s) 2022-05-18T05:08:31.4766079Z 2022-05-18T05:08:31.4766755Z ---------------------------------------------------------------------- 2022-05-18T05:08:31.4767421Z Ran 1 test in 4.820s 2022-05-18T05:08:31.4767719Z 2022-05-18T05:08:31.4767895Z OK 2022-05-18T05:08:31.4768162Z 2022-05-18T05:08:31.4768410Z Generating XML reports... 2022-05-18T05:08:31.4815704Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518050826.xml 2022-05-18T05:08:32.6265611Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T05:08:32.6279891Z 2022-05-18T05:08:32.6280121Z Running tests... 2022-05-18T05:08:32.6280557Z ---------------------------------------------------------------------- 2022-05-18T05:08:32.6292041Z test_ddp_checkpointing_once_use_reentrant_True (__main__.DistributedDataParallelTest) 2022-05-18T05:08:34.2725615Z DDP works as expected when layer is checkpointed only once. ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:08:34.3086568Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 72055 2022-05-18T05:08:34.3197760Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 72056 2022-05-18T05:08:35.2391959Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:08:35.2576807Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:08:36.5853084Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpf2499mov 2022-05-18T05:08:36.5853704Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpf2499mov/_remote_module_non_scriptable.py 2022-05-18T05:08:36.6104731Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpzkomnzjt 2022-05-18T05:08:36.6107393Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpzkomnzjt/_remote_module_non_scriptable.py 2022-05-18T05:08:36.9218302Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:08:36.9218859Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:08:36.9562206Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:08:36.9562744Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:08:36.9735596Z /opt/conda/lib/python3.7/site-packages/torch/nn/parallel/distributed.py:1737: UserWarning: You passed find_unused_parameters=true to DistributedDataParallel, `_set_static_graph` will detect unused parameters automatically, so you do not need to set find_unused_parameters=true, just be sure these unused parameters will not change during training loop while calling `_set_static_graph`. 2022-05-18T05:08:36.9736506Z "You passed find_unused_parameters=true to DistributedDataParallel, " 2022-05-18T05:08:36.9737669Z /opt/conda/lib/python3.7/site-packages/torch/nn/parallel/distributed.py:1737: UserWarning: You passed find_unused_parameters=true to DistributedDataParallel, `_set_static_graph` will detect unused parameters automatically, so you do not need to set find_unused_parameters=true, just be sure these unused parameters will not change during training loop while calling `_set_static_graph`. 2022-05-18T05:08:36.9738482Z "You passed find_unused_parameters=true to DistributedDataParallel, " 2022-05-18T05:08:36.9858175Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:08:36.9858713Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:08:37.0096793Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:08:37.0097312Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:08:37.0434494Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:08:37.0435039Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:08:37.0722758Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:08:37.0723305Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:08:37.4295691Z ok (4.801s) 2022-05-18T05:08:37.4295922Z 2022-05-18T05:08:37.4296333Z ---------------------------------------------------------------------- 2022-05-18T05:08:37.4296660Z Ran 1 test in 4.802s 2022-05-18T05:08:37.4296829Z 2022-05-18T05:08:37.4297265Z OK 2022-05-18T05:08:37.4297415Z 2022-05-18T05:08:37.4297558Z Generating XML reports... 2022-05-18T05:08:37.4341832Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518050832.xml 2022-05-18T05:08:38.6151080Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T05:08:38.6165948Z 2022-05-18T05:08:38.6166180Z Running tests... 2022-05-18T05:08:38.6166944Z ---------------------------------------------------------------------- 2022-05-18T05:08:38.6175451Z test_ddp_checkpointing_twice_static_graph_use_reentrant_False (__main__.DistributedDataParallelTest) 2022-05-18T05:08:40.2270804Z Regardless of reentrant or non-reentrant checkpointing impl, ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:08:40.2627767Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 72176 2022-05-18T05:08:40.2736116Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 72177 2022-05-18T05:08:41.1879201Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:08:41.1970911Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:08:42.5205756Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpjvymlj_0 2022-05-18T05:08:42.5206404Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpjvymlj_0/_remote_module_non_scriptable.py 2022-05-18T05:08:42.5406327Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp91bda0sa 2022-05-18T05:08:42.5408826Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp91bda0sa/_remote_module_non_scriptable.py 2022-05-18T05:08:42.8608499Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:08:42.8609076Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:08:42.8934671Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:08:42.8935455Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:08:43.1817467Z ok (4.565s) 2022-05-18T05:08:43.1817762Z 2022-05-18T05:08:43.1818312Z ---------------------------------------------------------------------- 2022-05-18T05:08:43.1818674Z Ran 1 test in 4.565s 2022-05-18T05:08:43.1818824Z 2022-05-18T05:08:43.1818949Z OK 2022-05-18T05:08:43.1819088Z 2022-05-18T05:08:43.1819244Z Generating XML reports... 2022-05-18T05:08:43.1861641Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518050838.xml 2022-05-18T05:08:44.3550629Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T05:08:44.3564285Z 2022-05-18T05:08:44.3564555Z Running tests... 2022-05-18T05:08:44.3565017Z ---------------------------------------------------------------------- 2022-05-18T05:08:44.3574021Z test_ddp_checkpointing_twice_static_graph_use_reentrant_True (__main__.DistributedDataParallelTest) 2022-05-18T05:08:45.9938454Z Regardless of reentrant or non-reentrant checkpointing impl, ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:08:46.0300469Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 72297 2022-05-18T05:08:46.0411107Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 72298 2022-05-18T05:08:46.9440649Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:08:46.9442069Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:08:48.2940300Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmppuqs90yc 2022-05-18T05:08:48.2940959Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmppuqs90yc/_remote_module_non_scriptable.py 2022-05-18T05:08:48.3200237Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpopepv6aj 2022-05-18T05:08:48.3202935Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpopepv6aj/_remote_module_non_scriptable.py 2022-05-18T05:08:48.6393901Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:08:48.6394470Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:08:48.6738982Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:08:48.6739529Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:08:49.0494768Z ok (4.693s) 2022-05-18T05:08:49.0494964Z 2022-05-18T05:08:49.0495374Z ---------------------------------------------------------------------- 2022-05-18T05:08:49.0495722Z Ran 1 test in 4.693s 2022-05-18T05:08:49.0495893Z 2022-05-18T05:08:49.0495997Z OK 2022-05-18T05:08:49.0496143Z 2022-05-18T05:08:49.0496279Z Generating XML reports... 2022-05-18T05:08:49.0540929Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518050844.xml 2022-05-18T05:08:50.2388787Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T05:08:50.2403595Z 2022-05-18T05:08:50.2404053Z Running tests... 2022-05-18T05:08:50.2404551Z ---------------------------------------------------------------------- 2022-05-18T05:08:50.2417310Z test_ddp_checkpointing_twice_use_reentrant_False (__main__.DistributedDataParallelTest) 2022-05-18T05:08:51.8839297Z Checkpoitning twice fails for non-static graph with reentrant checkpoint ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:08:51.9200990Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 72418 2022-05-18T05:08:51.9311502Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 72419 2022-05-18T05:08:52.8578387Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:08:52.8779661Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:08:54.2435485Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpz60hqxek 2022-05-18T05:08:54.2436305Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmps0ck2wp0 2022-05-18T05:08:54.2436887Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpz60hqxek/_remote_module_non_scriptable.py 2022-05-18T05:08:54.2437803Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmps0ck2wp0/_remote_module_non_scriptable.py 2022-05-18T05:08:54.5574358Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:08:54.5574914Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:08:54.5837541Z [W reducer.cpp:1258] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 2022-05-18T05:08:54.5839115Z [W reducer.cpp:1258] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 2022-05-18T05:08:54.6228248Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:08:54.6229057Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:08:55.0399424Z ok (4.799s) 2022-05-18T05:08:55.0399814Z 2022-05-18T05:08:55.0400352Z ---------------------------------------------------------------------- 2022-05-18T05:08:55.0400713Z Ran 1 test in 4.800s 2022-05-18T05:08:55.0400884Z 2022-05-18T05:08:55.0401222Z OK 2022-05-18T05:08:55.0401390Z 2022-05-18T05:08:55.0401527Z Generating XML reports... 2022-05-18T05:08:55.0446172Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518050850.xml 2022-05-18T05:08:56.2132521Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T05:08:56.2148188Z 2022-05-18T05:08:56.2148465Z Running tests... 2022-05-18T05:08:56.2148901Z ---------------------------------------------------------------------- 2022-05-18T05:08:56.2161573Z test_ddp_checkpointing_twice_use_reentrant_True (__main__.DistributedDataParallelTest) 2022-05-18T05:08:57.8132977Z Checkpoitning twice fails for non-static graph with reentrant checkpoint ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:08:57.8486017Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 72539 2022-05-18T05:08:57.8594952Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 72540 2022-05-18T05:08:58.7864297Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:08:58.8065746Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:09:00.1483248Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp3bsfjqw9 2022-05-18T05:09:00.1483849Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp3bsfjqw9/_remote_module_non_scriptable.py 2022-05-18T05:09:00.1857411Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpgcb92llf 2022-05-18T05:09:00.1859596Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpgcb92llf/_remote_module_non_scriptable.py 2022-05-18T05:09:00.8678028Z ok (4.653s) 2022-05-18T05:09:00.8678255Z 2022-05-18T05:09:00.8678657Z ---------------------------------------------------------------------- 2022-05-18T05:09:00.8678989Z Ran 1 test in 4.653s 2022-05-18T05:09:00.8679159Z 2022-05-18T05:09:00.8679265Z OK 2022-05-18T05:09:00.8679417Z 2022-05-18T05:09:00.8679555Z Generating XML reports... 2022-05-18T05:09:00.8723041Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518050856.xml 2022-05-18T05:09:02.0424382Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T05:09:02.0439367Z 2022-05-18T05:09:02.0439790Z Running tests... 2022-05-18T05:09:02.0440231Z ---------------------------------------------------------------------- 2022-05-18T05:09:02.0448393Z test_ddp_checkpointing_twice_weight_sharing (__main__.DistributedDataParallelTest) 2022-05-18T05:09:03.6514755Z Checkpointing should work with static graph in the case of checkpointing ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:09:03.6872780Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 72660 2022-05-18T05:09:03.6986237Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 72661 2022-05-18T05:09:04.5880490Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:09:04.5942533Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:09:05.9364889Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmps9tuudn2 2022-05-18T05:09:05.9365504Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmps9tuudn2/_remote_module_non_scriptable.py 2022-05-18T05:09:05.9706389Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp0y3cj98z 2022-05-18T05:09:05.9708432Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp0y3cj98z/_remote_module_non_scriptable.py 2022-05-18T05:09:06.2866063Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:09:06.2866629Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:09:06.3194507Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:09:06.3195033Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:09:06.6068256Z ok (4.563s) 2022-05-18T05:09:06.6068682Z 2022-05-18T05:09:06.6069460Z ---------------------------------------------------------------------- 2022-05-18T05:09:06.6069936Z Ran 1 test in 4.563s 2022-05-18T05:09:06.6070106Z 2022-05-18T05:09:06.6070202Z OK 2022-05-18T05:09:06.6070321Z 2022-05-18T05:09:06.6070461Z Generating XML reports... 2022-05-18T05:09:06.6113097Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518050902.xml 2022-05-18T05:09:07.7939052Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T05:09:07.7953968Z 2022-05-18T05:09:07.7954197Z Running tests... 2022-05-18T05:09:07.7955137Z ---------------------------------------------------------------------- 2022-05-18T05:09:07.7968321Z test_ddp_checkpointing_unused_params_use_reentrant_False (__main__.DistributedDataParallelTest) 2022-05-18T05:09:09.4424995Z With reentrant autograd checkpointing impl, DDP will fail when there are ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:09:09.4786013Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 72781 2022-05-18T05:09:09.4896541Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 72782 2022-05-18T05:09:10.3849948Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:09:10.4144747Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:09:11.7567801Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpc7hk6rt4 2022-05-18T05:09:11.7568965Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpc7hk6rt4/_remote_module_non_scriptable.py 2022-05-18T05:09:11.7823661Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpva71en5v 2022-05-18T05:09:11.7825997Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpva71en5v/_remote_module_non_scriptable.py 2022-05-18T05:09:12.0782993Z [W reducer.cpp:1258] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 2022-05-18T05:09:12.0839932Z [W reducer.cpp:1258] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 2022-05-18T05:09:12.1131726Z /opt/conda/lib/python3.7/site-packages/torch/nn/parallel/distributed.py:1737: UserWarning: You passed find_unused_parameters=true to DistributedDataParallel, `_set_static_graph` will detect unused parameters automatically, so you do not need to set find_unused_parameters=true, just be sure these unused parameters will not change during training loop while calling `_set_static_graph`. 2022-05-18T05:09:12.1133670Z "You passed find_unused_parameters=true to DistributedDataParallel, " 2022-05-18T05:09:12.1136018Z /opt/conda/lib/python3.7/site-packages/torch/nn/parallel/distributed.py:1737: UserWarning: You passed find_unused_parameters=true to DistributedDataParallel, `_set_static_graph` will detect unused parameters automatically, so you do not need to set find_unused_parameters=true, just be sure these unused parameters will not change during training loop while calling `_set_static_graph`. 2022-05-18T05:09:12.1136896Z "You passed find_unused_parameters=true to DistributedDataParallel, " 2022-05-18T05:09:12.1247064Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:09:12.1248076Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:09:12.1806466Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:09:12.1807477Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:09:12.4994459Z ok (4.704s) 2022-05-18T05:09:12.4994706Z 2022-05-18T05:09:12.4995322Z ---------------------------------------------------------------------- 2022-05-18T05:09:12.4995686Z Ran 1 test in 4.704s 2022-05-18T05:09:12.4995857Z 2022-05-18T05:09:12.4995978Z OK 2022-05-18T05:09:12.4996100Z 2022-05-18T05:09:12.4996239Z Generating XML reports... 2022-05-18T05:09:12.5041292Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518050907.xml 2022-05-18T05:09:13.7022892Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T05:09:13.7039124Z 2022-05-18T05:09:13.7039692Z Running tests... 2022-05-18T05:09:13.7040213Z ---------------------------------------------------------------------- 2022-05-18T05:09:13.7051974Z test_ddp_checkpointing_unused_params_use_reentrant_True (__main__.DistributedDataParallelTest) 2022-05-18T05:09:15.3851842Z With reentrant autograd checkpointing impl, DDP will fail when there are ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:09:15.4214334Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 72902 2022-05-18T05:09:15.4326767Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 72903 2022-05-18T05:09:16.3616593Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:09:16.3805124Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:09:17.7361604Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpubq6dvey 2022-05-18T05:09:17.7366167Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpubq6dvey/_remote_module_non_scriptable.py 2022-05-18T05:09:17.7366811Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp82i_49hv 2022-05-18T05:09:17.7367567Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp82i_49hv/_remote_module_non_scriptable.py 2022-05-18T05:09:18.0581770Z /opt/conda/lib/python3.7/site-packages/torch/nn/parallel/distributed.py:1737: UserWarning: You passed find_unused_parameters=true to DistributedDataParallel, `_set_static_graph` will detect unused parameters automatically, so you do not need to set find_unused_parameters=true, just be sure these unused parameters will not change during training loop while calling `_set_static_graph`. 2022-05-18T05:09:18.0582749Z "You passed find_unused_parameters=true to DistributedDataParallel, " 2022-05-18T05:09:18.0584470Z /opt/conda/lib/python3.7/site-packages/torch/nn/parallel/distributed.py:1737: UserWarning: You passed find_unused_parameters=true to DistributedDataParallel, `_set_static_graph` will detect unused parameters automatically, so you do not need to set find_unused_parameters=true, just be sure these unused parameters will not change during training loop while calling `_set_static_graph`. 2022-05-18T05:09:18.0585720Z "You passed find_unused_parameters=true to DistributedDataParallel, " 2022-05-18T05:09:18.0718750Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:09:18.0719283Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:09:18.1159591Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:09:18.1160099Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:09:18.4414204Z ok (4.737s) 2022-05-18T05:09:18.4414445Z 2022-05-18T05:09:18.4414855Z ---------------------------------------------------------------------- 2022-05-18T05:09:18.4415182Z Ran 1 test in 4.738s 2022-05-18T05:09:18.4415355Z 2022-05-18T05:09:18.4415455Z OK 2022-05-18T05:09:18.4415593Z 2022-05-18T05:09:18.4415728Z Generating XML reports... 2022-05-18T05:09:18.4459685Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518050913.xml 2022-05-18T05:09:19.6022037Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T05:09:19.6037868Z 2022-05-18T05:09:19.6038316Z Running tests... 2022-05-18T05:09:19.6038958Z ---------------------------------------------------------------------- 2022-05-18T05:09:19.6053691Z test_ddp_checkpointing_weight_sharing_use_reentrant_False (__main__.DistributedDataParallelTest) 2022-05-18T05:09:21.2470152Z Test that checkpointing with weight sharing works. ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:09:21.2823109Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 73023 2022-05-18T05:09:21.2931858Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 73024 2022-05-18T05:09:22.1934164Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:09:22.2085800Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:09:23.5584141Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpss7rv0dv 2022-05-18T05:09:23.5585097Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpss7rv0dv/_remote_module_non_scriptable.py 2022-05-18T05:09:23.5656128Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp0pu3qrjp 2022-05-18T05:09:23.5658953Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp0pu3qrjp/_remote_module_non_scriptable.py 2022-05-18T05:09:23.8757925Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:09:23.8758477Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:09:23.9132988Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:09:23.9133498Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:09:24.2014242Z ok (4.597s) 2022-05-18T05:09:24.2014631Z 2022-05-18T05:09:24.2015383Z ---------------------------------------------------------------------- 2022-05-18T05:09:24.2015798Z Ran 1 test in 4.598s 2022-05-18T05:09:24.2015969Z 2022-05-18T05:09:24.2016067Z OK 2022-05-18T05:09:24.2016207Z 2022-05-18T05:09:24.2016341Z Generating XML reports... 2022-05-18T05:09:24.2058886Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518050919.xml 2022-05-18T05:09:25.3873291Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T05:09:25.3887656Z 2022-05-18T05:09:25.3887894Z Running tests... 2022-05-18T05:09:25.3888342Z ---------------------------------------------------------------------- 2022-05-18T05:09:25.3902973Z test_ddp_checkpointing_weight_sharing_use_reentrant_True (__main__.DistributedDataParallelTest) 2022-05-18T05:09:27.0342351Z Test that checkpointing with weight sharing works. ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:09:27.0701906Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 73144 2022-05-18T05:09:27.0814097Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 73145 2022-05-18T05:09:28.0189117Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:09:28.0241587Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:09:29.3927825Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpo6ierf5i 2022-05-18T05:09:29.3928743Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpo6ierf5i/_remote_module_non_scriptable.py 2022-05-18T05:09:29.3959977Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpg6vhdbd3 2022-05-18T05:09:29.3963253Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpg6vhdbd3/_remote_module_non_scriptable.py 2022-05-18T05:09:29.7119749Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:09:29.7120369Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:09:29.7447441Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:09:29.7447958Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:09:29.7675363Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:09:29.7675859Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:09:29.7998127Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:09:29.7998620Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:09:30.0899634Z ok (4.701s) 2022-05-18T05:09:30.0899844Z 2022-05-18T05:09:30.0900458Z ---------------------------------------------------------------------- 2022-05-18T05:09:30.0900813Z Ran 1 test in 4.701s 2022-05-18T05:09:30.0900983Z 2022-05-18T05:09:30.0902048Z OK 2022-05-18T05:09:30.0902248Z 2022-05-18T05:09:30.0902656Z Generating XML reports... 2022-05-18T05:09:30.0943599Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518050925.xml 2022-05-18T05:09:31.2757450Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T05:09:31.2773101Z 2022-05-18T05:09:31.2773495Z Running tests... 2022-05-18T05:09:31.2774095Z ---------------------------------------------------------------------- 2022-05-18T05:09:31.2783426Z test_ddp_comm_hook_future_passing_cpu (__main__.DistributedDataParallelTest) 2022-05-18T05:09:32.9258214Z This unit test verifies whether the Future object is passed properly. ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:09:32.9608741Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 73265 2022-05-18T05:09:32.9717690Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 73266 2022-05-18T05:09:33.9012433Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:09:33.9200934Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:09:33.9406851Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp1g33i841 2022-05-18T05:09:33.9409371Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp1g33i841/_remote_module_non_scriptable.py 2022-05-18T05:09:33.9409931Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpa9x5ao9_ 2022-05-18T05:09:33.9412665Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpa9x5ao9_/_remote_module_non_scriptable.py 2022-05-18T05:09:34.1762453Z ok (2.899s) 2022-05-18T05:09:34.1763021Z 2022-05-18T05:09:34.1763406Z ---------------------------------------------------------------------- 2022-05-18T05:09:34.1763759Z Ran 1 test in 2.899s 2022-05-18T05:09:34.1763931Z 2022-05-18T05:09:34.1764034Z OK 2022-05-18T05:09:34.1764177Z 2022-05-18T05:09:34.1764320Z Generating XML reports... 2022-05-18T05:09:34.1808478Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518050931.xml 2022-05-18T05:09:35.3457760Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T05:09:35.3472570Z 2022-05-18T05:09:35.3472823Z Running tests... 2022-05-18T05:09:35.3473270Z ---------------------------------------------------------------------- 2022-05-18T05:09:35.3482674Z test_ddp_comm_hook_future_passing_gpu_gloo (__main__.DistributedDataParallelTest) 2022-05-18T05:09:37.0101589Z This unit test verifies whether the Future object is passed properly using gloo backend. ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:09:37.0456956Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 73384 2022-05-18T05:09:37.0565484Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 73385 2022-05-18T05:09:37.8993818Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:09:37.9499339Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:09:39.2810832Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp5yywngyh 2022-05-18T05:09:39.2812004Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp5yywngyh/_remote_module_non_scriptable.py 2022-05-18T05:09:39.3047934Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp3c97z_n7 2022-05-18T05:09:39.3050762Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp3c97z_n7/_remote_module_non_scriptable.py 2022-05-18T05:09:39.6652605Z ok (4.318s) 2022-05-18T05:09:39.6652841Z 2022-05-18T05:09:39.6653260Z ---------------------------------------------------------------------- 2022-05-18T05:09:39.6653616Z Ran 1 test in 4.318s 2022-05-18T05:09:39.6653796Z 2022-05-18T05:09:39.6653875Z OK 2022-05-18T05:09:39.6654022Z 2022-05-18T05:09:39.6654157Z Generating XML reports... 2022-05-18T05:09:39.6699680Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518050935.xml 2022-05-18T05:09:40.8702867Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T05:09:40.8718571Z 2022-05-18T05:09:40.8719001Z Running tests... 2022-05-18T05:09:40.8719524Z ---------------------------------------------------------------------- 2022-05-18T05:09:40.8731125Z test_ddp_comm_hook_register_just_once (__main__.DistributedDataParallelTest) 2022-05-18T05:09:42.5078261Z DDP communication hook can only be registered once. This test validates whether ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:09:42.5432435Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 73505 2022-05-18T05:09:42.5540426Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 73506 2022-05-18T05:09:43.4349675Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:09:43.4427934Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:09:43.4736111Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmprnt92k80 2022-05-18T05:09:43.4738761Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmprnt92k80/_remote_module_non_scriptable.py 2022-05-18T05:09:43.4741146Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp13x65tcb 2022-05-18T05:09:43.4744400Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp13x65tcb/_remote_module_non_scriptable.py 2022-05-18T05:09:43.6584609Z ok (2.786s) 2022-05-18T05:09:43.6585491Z 2022-05-18T05:09:43.6586111Z ---------------------------------------------------------------------- 2022-05-18T05:09:43.6586461Z Ran 1 test in 2.787s 2022-05-18T05:09:43.6586628Z 2022-05-18T05:09:43.6586707Z OK 2022-05-18T05:09:43.6586847Z 2022-05-18T05:09:43.6586983Z Generating XML reports... 2022-05-18T05:09:43.6629894Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518050940.xml 2022-05-18T05:09:44.8241195Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T05:09:44.8256064Z 2022-05-18T05:09:44.8256364Z Running tests... 2022-05-18T05:09:44.8256859Z ---------------------------------------------------------------------- 2022-05-18T05:09:44.8270644Z test_ddp_comm_hook_sparse_gradients (__main__.DistributedDataParallelTest) 2022-05-18T05:09:46.4692353Z Runs "test_sparse_gradients" unit test with DDP communication hook. We define a ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:09:46.5044310Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 73620 2022-05-18T05:09:46.5156399Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 73621 2022-05-18T05:09:47.3962330Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:09:47.4082536Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:09:47.4295815Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp6zem_uaf 2022-05-18T05:09:47.4298488Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp6zem_uaf/_remote_module_non_scriptable.py 2022-05-18T05:09:47.4300322Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpttp7htuv 2022-05-18T05:09:47.4303354Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpttp7htuv/_remote_module_non_scriptable.py 2022-05-18T05:09:47.6199179Z ok (2.794s) 2022-05-18T05:09:47.6199407Z 2022-05-18T05:09:47.6199809Z ---------------------------------------------------------------------- 2022-05-18T05:09:47.6200159Z Ran 1 test in 2.794s 2022-05-18T05:09:47.6200326Z 2022-05-18T05:09:47.6200423Z OK 2022-05-18T05:09:47.6200564Z 2022-05-18T05:09:47.6201376Z Generating XML reports... 2022-05-18T05:09:47.6243129Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518050944.xml 2022-05-18T05:09:48.7879713Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T05:09:48.7895845Z 2022-05-18T05:09:48.7896394Z Running tests... 2022-05-18T05:09:48.7896929Z ---------------------------------------------------------------------- 2022-05-18T05:09:48.7909635Z test_ddp_invalid_comm_hook_init (__main__.DistributedDataParallelTest) 2022-05-18T05:09:50.4306982Z This unit test makes sure that register_comm_hook properly checks the format ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:09:50.4661387Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 73769 2022-05-18T05:09:50.4769270Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 73770 2022-05-18T05:09:51.3892513Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:09:51.4136303Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:09:51.4403903Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpmh57yy06 2022-05-18T05:09:51.4404461Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp50c7hqa2 2022-05-18T05:09:51.4406732Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpmh57yy06/_remote_module_non_scriptable.py 2022-05-18T05:09:51.4407524Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp50c7hqa2/_remote_module_non_scriptable.py 2022-05-18T05:09:51.6814558Z ok (2.891s) 2022-05-18T05:09:51.6814912Z 2022-05-18T05:09:51.6816079Z ---------------------------------------------------------------------- 2022-05-18T05:09:51.6816441Z Ran 1 test in 2.892s 2022-05-18T05:09:51.6816618Z 2022-05-18T05:09:51.6816698Z OK 2022-05-18T05:09:51.6816836Z 2022-05-18T05:09:51.6816977Z Generating XML reports... 2022-05-18T05:09:51.6860380Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518050948.xml 2022-05-18T05:09:52.8499150Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T05:09:52.8514303Z 2022-05-18T05:09:52.8514540Z Running tests... 2022-05-18T05:09:52.8514977Z ---------------------------------------------------------------------- 2022-05-18T05:09:52.8530401Z test_ddp_invalid_comm_hook_return_type (__main__.DistributedDataParallelTest) 2022-05-18T05:09:54.5058045Z This test checks whether return annotation checked properly if defined. It also ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:09:54.5415370Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 73884 2022-05-18T05:09:54.5523097Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 73885 2022-05-18T05:09:55.4772603Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:09:55.4929405Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:09:55.5182336Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpzamnnbmi 2022-05-18T05:09:55.5184835Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpzamnnbmi/_remote_module_non_scriptable.py 2022-05-18T05:09:55.5185398Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpzl2y2lf5 2022-05-18T05:09:55.5188457Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpzl2y2lf5/_remote_module_non_scriptable.py 2022-05-18T05:09:55.7568051Z ok (2.905s) 2022-05-18T05:09:55.7568273Z 2022-05-18T05:09:55.7568705Z ---------------------------------------------------------------------- 2022-05-18T05:09:55.7569031Z Ran 1 test in 2.905s 2022-05-18T05:09:55.7569207Z 2022-05-18T05:09:55.7569313Z OK 2022-05-18T05:09:55.7569459Z 2022-05-18T05:09:55.7569597Z Generating XML reports... 2022-05-18T05:09:55.7619696Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518050952.xml 2022-05-18T05:09:56.9238690Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T05:09:56.9253322Z 2022-05-18T05:09:56.9253564Z Running tests... 2022-05-18T05:09:56.9254016Z ---------------------------------------------------------------------- 2022-05-18T05:09:56.9275713Z test_find_unused_parameters_when_unused_parameters_empty (__main__.DistributedDataParallelTest) 2022-05-18T05:09:58.5674253Z An empty unused_parameters array does not imply find_unused_parameters = ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:09:58.6035566Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 74003 2022-05-18T05:09:58.6146688Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 74004 2022-05-18T05:09:59.5086641Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:09:59.5137774Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:09:59.5392438Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp9u6z6e08 2022-05-18T05:09:59.5395140Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp9u6z6e08/_remote_module_non_scriptable.py 2022-05-18T05:09:59.5395702Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpf5chkq3f 2022-05-18T05:09:59.5398018Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpf5chkq3f/_remote_module_non_scriptable.py 2022-05-18T05:09:59.5545497Z [W reducer.cpp:1258] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 2022-05-18T05:10:01.2223006Z ok (4.297s) 2022-05-18T05:10:01.2223343Z 2022-05-18T05:10:01.2223877Z ---------------------------------------------------------------------- 2022-05-18T05:10:01.2224232Z Ran 1 test in 4.297s 2022-05-18T05:10:01.2224407Z 2022-05-18T05:10:01.2224505Z OK 2022-05-18T05:10:01.2224623Z 2022-05-18T05:10:01.2224759Z Generating XML reports... 2022-05-18T05:10:01.2268465Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518050956.xml 2022-05-18T05:10:02.4506914Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T05:10:02.4521784Z 2022-05-18T05:10:02.4522030Z Running tests... 2022-05-18T05:10:02.4522456Z ---------------------------------------------------------------------- 2022-05-18T05:10:04.1146480Z test_global_local_unused_params_grad (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:10:04.1508540Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 74124 2022-05-18T05:10:04.1619279Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 74125 2022-05-18T05:10:05.0878518Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:10:05.1008885Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:10:05.1219412Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp5cpd6p1l 2022-05-18T05:10:05.1220672Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpq7t6zrqh 2022-05-18T05:10:05.1221685Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp5cpd6p1l/_remote_module_non_scriptable.py 2022-05-18T05:10:05.1222986Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpq7t6zrqh/_remote_module_non_scriptable.py 2022-05-18T05:10:06.7695972Z ok (4.317s) 2022-05-18T05:10:06.7696282Z 2022-05-18T05:10:06.7696698Z ---------------------------------------------------------------------- 2022-05-18T05:10:06.7697049Z Ran 1 test in 4.317s 2022-05-18T05:10:06.7697198Z 2022-05-18T05:10:06.7697292Z OK 2022-05-18T05:10:06.7697426Z 2022-05-18T05:10:06.7697622Z Generating XML reports... 2022-05-18T05:10:06.7740795Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518051002.xml 2022-05-18T05:10:07.9726039Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T05:10:07.9742207Z 2022-05-18T05:10:07.9742432Z Running tests... 2022-05-18T05:10:07.9742864Z ---------------------------------------------------------------------- 2022-05-18T05:10:09.6357956Z test_global_local_unused_params_grad_with_grad_is_view (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:10:09.6718975Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 74245 2022-05-18T05:10:09.6829442Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 74246 2022-05-18T05:10:10.6106847Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:10:10.6175035Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:10:10.6415337Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpshqpge0x 2022-05-18T05:10:10.6415887Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp2w60wnzv 2022-05-18T05:10:10.6418204Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpshqpge0x/_remote_module_non_scriptable.py 2022-05-18T05:10:10.6418774Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp2w60wnzv/_remote_module_non_scriptable.py 2022-05-18T05:10:12.2905666Z ok (4.316s) 2022-05-18T05:10:12.2905888Z 2022-05-18T05:10:12.2906320Z ---------------------------------------------------------------------- 2022-05-18T05:10:12.2906961Z Ran 1 test in 4.316s 2022-05-18T05:10:12.2907160Z 2022-05-18T05:10:12.2907243Z OK 2022-05-18T05:10:12.2907383Z 2022-05-18T05:10:12.2907522Z Generating XML reports... 2022-05-18T05:10:12.2950852Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518051007.xml 2022-05-18T05:10:13.4719482Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T05:10:13.4735475Z 2022-05-18T05:10:13.4735826Z Running tests... 2022-05-18T05:10:13.4736356Z ---------------------------------------------------------------------- 2022-05-18T05:10:15.1392350Z test_global_local_unused_params_grad_with_static_graph (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:10:15.1744504Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 74366 2022-05-18T05:10:15.1853039Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 74367 2022-05-18T05:10:16.0846978Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:10:16.1156079Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:10:16.1363055Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpteul6ukp 2022-05-18T05:10:16.1364319Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmph4knisbs 2022-05-18T05:10:16.1365510Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpteul6ukp/_remote_module_non_scriptable.py 2022-05-18T05:10:16.1367303Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmph4knisbs/_remote_module_non_scriptable.py 2022-05-18T05:10:16.1515019Z /opt/conda/lib/python3.7/site-packages/torch/nn/parallel/distributed.py:1737: UserWarning: You passed find_unused_parameters=true to DistributedDataParallel, `_set_static_graph` will detect unused parameters automatically, so you do not need to set find_unused_parameters=true, just be sure these unused parameters will not change during training loop while calling `_set_static_graph`. 2022-05-18T05:10:16.1515880Z "You passed find_unused_parameters=true to DistributedDataParallel, " 2022-05-18T05:10:16.1517037Z /opt/conda/lib/python3.7/site-packages/torch/nn/parallel/distributed.py:1737: UserWarning: You passed find_unused_parameters=true to DistributedDataParallel, `_set_static_graph` will detect unused parameters automatically, so you do not need to set find_unused_parameters=true, just be sure these unused parameters will not change during training loop while calling `_set_static_graph`. 2022-05-18T05:10:16.1517853Z "You passed find_unused_parameters=true to DistributedDataParallel, " 2022-05-18T05:10:17.7937940Z ok (4.320s) 2022-05-18T05:10:17.7938173Z 2022-05-18T05:10:17.7938576Z ---------------------------------------------------------------------- 2022-05-18T05:10:17.7938923Z Ran 1 test in 4.320s 2022-05-18T05:10:17.7939092Z 2022-05-18T05:10:17.7939178Z OK 2022-05-18T05:10:17.7939332Z 2022-05-18T05:10:17.7939473Z Generating XML reports... 2022-05-18T05:10:17.7982871Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518051013.xml 2022-05-18T05:10:18.9668853Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T05:10:18.9683546Z 2022-05-18T05:10:18.9683976Z Running tests... 2022-05-18T05:10:18.9684476Z ---------------------------------------------------------------------- 2022-05-18T05:10:20.5860711Z test_gloo_backend_1gpu_module_device_ids_integer_list (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:10:20.6218192Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 74487 2022-05-18T05:10:20.6325370Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 74488 2022-05-18T05:10:21.4837422Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:10:21.5277003Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:10:22.8592295Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpge17k3b4 2022-05-18T05:10:22.8592918Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpge17k3b4/_remote_module_non_scriptable.py 2022-05-18T05:10:22.8908769Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp96vo81df 2022-05-18T05:10:22.8910976Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp96vo81df/_remote_module_non_scriptable.py 2022-05-18T05:10:23.2037157Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:10:23.2037695Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:10:23.5417120Z ok (4.573s) 2022-05-18T05:10:23.5417313Z 2022-05-18T05:10:23.5417726Z ---------------------------------------------------------------------- 2022-05-18T05:10:23.5418052Z Ran 1 test in 4.573s 2022-05-18T05:10:23.5418230Z 2022-05-18T05:10:23.5418331Z OK 2022-05-18T05:10:23.5418474Z 2022-05-18T05:10:23.5418609Z Generating XML reports... 2022-05-18T05:10:23.5463076Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518051018.xml 2022-05-18T05:10:24.7277731Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T05:10:24.7292307Z 2022-05-18T05:10:24.7292813Z Running tests... 2022-05-18T05:10:24.7293348Z ---------------------------------------------------------------------- 2022-05-18T05:10:26.3639659Z test_gloo_backend_1gpu_module_device_ids_torch_device_list (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:10:26.3992351Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 74608 2022-05-18T05:10:26.4101081Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 74609 2022-05-18T05:10:27.3322265Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:10:27.3614062Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:10:28.7128738Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpljdwq2ef 2022-05-18T05:10:28.7129340Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpljdwq2ef/_remote_module_non_scriptable.py 2022-05-18T05:10:28.7270892Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpiym1jtce 2022-05-18T05:10:28.7273811Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpiym1jtce/_remote_module_non_scriptable.py 2022-05-18T05:10:29.0374987Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:10:29.0375534Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:10:29.3182248Z ok (4.589s) 2022-05-18T05:10:29.3182473Z 2022-05-18T05:10:29.3182930Z ---------------------------------------------------------------------- 2022-05-18T05:10:29.3183264Z Ran 1 test in 4.589s 2022-05-18T05:10:29.3183442Z 2022-05-18T05:10:29.3183538Z OK 2022-05-18T05:10:29.3183678Z 2022-05-18T05:10:29.3183813Z Generating XML reports... 2022-05-18T05:10:29.3227190Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518051024.xml 2022-05-18T05:10:30.4912739Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T05:10:30.4926957Z 2022-05-18T05:10:30.4927111Z Running tests... 2022-05-18T05:10:30.4927565Z ---------------------------------------------------------------------- 2022-05-18T05:10:32.1067324Z test_gloo_backend_2gpu_module (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:10:32.1426264Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 74729 2022-05-18T05:10:32.1534735Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 74730 2022-05-18T05:10:33.0885392Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:10:33.0993041Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:10:33.2578363Z skip: Need at least 4 CUDA devices (2.765s) 2022-05-18T05:10:33.2578625Z 2022-05-18T05:10:33.2579030Z ---------------------------------------------------------------------- 2022-05-18T05:10:33.2579379Z Ran 1 test in 2.765s 2022-05-18T05:10:33.2579547Z 2022-05-18T05:10:33.2579661Z OK (skipped=1) 2022-05-18T05:10:33.2579818Z 2022-05-18T05:10:33.2579946Z Generating XML reports... 2022-05-18T05:10:33.2634976Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518051030.xml 2022-05-18T05:10:34.4396829Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T05:10:34.4410996Z 2022-05-18T05:10:34.4411211Z Running tests... 2022-05-18T05:10:34.4411658Z ---------------------------------------------------------------------- 2022-05-18T05:10:36.1011914Z test_gloo_backend_4gpu_module (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:10:36.1365276Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 74838 2022-05-18T05:10:36.1474378Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 74839 2022-05-18T05:10:37.0710494Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:10:37.0757131Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:10:37.2517033Z skip: Need at least 8 CUDA devices (2.810s) 2022-05-18T05:10:37.2517296Z 2022-05-18T05:10:37.2517702Z ---------------------------------------------------------------------- 2022-05-18T05:10:37.2518052Z Ran 1 test in 2.811s 2022-05-18T05:10:37.2518226Z 2022-05-18T05:10:37.2518341Z OK (skipped=1) 2022-05-18T05:10:37.2518511Z 2022-05-18T05:10:37.2518644Z Generating XML reports... 2022-05-18T05:10:37.2576026Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518051034.xml 2022-05-18T05:10:38.4264963Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T05:10:38.4279415Z 2022-05-18T05:10:38.4279568Z Running tests... 2022-05-18T05:10:38.4280023Z ---------------------------------------------------------------------- 2022-05-18T05:10:40.0729772Z test_gloo_backend_cpu_module (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:10:40.1090150Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 74947 2022-05-18T05:10:40.1200475Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 74948 2022-05-18T05:10:41.0212768Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:10:41.0245707Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:10:41.0526243Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpoc33z0vl 2022-05-18T05:10:41.0529070Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpoc33z0vl/_remote_module_non_scriptable.py 2022-05-18T05:10:41.0529608Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpup83ng3k 2022-05-18T05:10:41.0532618Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpup83ng3k/_remote_module_non_scriptable.py 2022-05-18T05:10:41.0735593Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:10:41.0736107Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:10:41.3247914Z ok (2.896s) 2022-05-18T05:10:41.3248144Z 2022-05-18T05:10:41.3248870Z ---------------------------------------------------------------------- 2022-05-18T05:10:41.3249295Z Ran 1 test in 2.897s 2022-05-18T05:10:41.3249476Z 2022-05-18T05:10:41.3249586Z OK 2022-05-18T05:10:41.3249710Z 2022-05-18T05:10:41.3249855Z Generating XML reports... 2022-05-18T05:10:41.3297069Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518051038.xml 2022-05-18T05:10:42.5065963Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T05:10:42.5081369Z 2022-05-18T05:10:42.5081599Z Running tests... 2022-05-18T05:10:42.5082049Z ---------------------------------------------------------------------- 2022-05-18T05:10:44.1730051Z test_gloo_backend_cpu_module_grad_is_view (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:10:44.2092665Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 75066 2022-05-18T05:10:44.2202468Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 75067 2022-05-18T05:10:45.1067245Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:10:45.1225548Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:10:45.1484203Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpj4d8rmzb 2022-05-18T05:10:45.1487386Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpj4d8rmzb/_remote_module_non_scriptable.py 2022-05-18T05:10:45.1488884Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpgk11rmrn 2022-05-18T05:10:45.1492110Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpgk11rmrn/_remote_module_non_scriptable.py 2022-05-18T05:10:45.1694776Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:10:45.1695302Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:10:45.3243090Z ok (2.816s) 2022-05-18T05:10:45.3243324Z 2022-05-18T05:10:45.3243705Z ---------------------------------------------------------------------- 2022-05-18T05:10:45.3244060Z Ran 1 test in 2.816s 2022-05-18T05:10:45.3244228Z 2022-05-18T05:10:45.3244327Z OK 2022-05-18T05:10:45.3244464Z 2022-05-18T05:10:45.3244599Z Generating XML reports... 2022-05-18T05:10:45.3287886Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518051042.xml 2022-05-18T05:10:46.4952028Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T05:10:46.4968237Z 2022-05-18T05:10:46.4968657Z Running tests... 2022-05-18T05:10:46.4969179Z ---------------------------------------------------------------------- 2022-05-18T05:10:46.4989084Z test_ignored_output (__main__.DistributedDataParallelTest) 2022-05-18T05:10:48.1590451Z Test that the output of a model can be ignored and that there is no ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:10:48.1954431Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 75185 2022-05-18T05:10:48.2066596Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 75186 2022-05-18T05:10:49.1650265Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:10:49.1718372Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:10:49.1958897Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp4ntcfrcx 2022-05-18T05:10:49.1961477Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpb57qbdxa 2022-05-18T05:10:49.1962025Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp4ntcfrcx/_remote_module_non_scriptable.py 2022-05-18T05:10:49.1964531Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpb57qbdxa/_remote_module_non_scriptable.py 2022-05-18T05:10:49.2182924Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:10:49.2183473Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:10:49.4110289Z ok (2.914s) 2022-05-18T05:10:49.4110696Z 2022-05-18T05:10:49.4111150Z ---------------------------------------------------------------------- 2022-05-18T05:10:49.4111502Z Ran 1 test in 2.914s 2022-05-18T05:10:49.4111669Z 2022-05-18T05:10:49.4111766Z OK 2022-05-18T05:10:49.4111886Z 2022-05-18T05:10:49.4112036Z Generating XML reports... 2022-05-18T05:10:49.4155339Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518051046.xml 2022-05-18T05:10:50.5799999Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T05:10:50.5815580Z 2022-05-18T05:10:50.5816009Z Running tests... 2022-05-18T05:10:50.5816543Z ---------------------------------------------------------------------- 2022-05-18T05:10:50.5837777Z test_ignored_output_with_unused_parameters (__main__.DistributedDataParallelTest) 2022-05-18T05:10:52.2423746Z Test that the output of a model can be ignored and that there is no ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:10:52.2778256Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 75334 2022-05-18T05:10:52.2887079Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 75335 2022-05-18T05:10:53.1887089Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:10:53.1963987Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:10:53.2195598Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpz_a55t6g 2022-05-18T05:10:53.2198196Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpz_a55t6g/_remote_module_non_scriptable.py 2022-05-18T05:10:53.2199531Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp3vuc3epi 2022-05-18T05:10:53.2202935Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp3vuc3epi/_remote_module_non_scriptable.py 2022-05-18T05:10:53.4931925Z ok (2.911s) 2022-05-18T05:10:53.4932170Z 2022-05-18T05:10:53.4932588Z ---------------------------------------------------------------------- 2022-05-18T05:10:53.4932942Z Ran 1 test in 2.912s 2022-05-18T05:10:53.4933119Z 2022-05-18T05:10:53.4933231Z OK 2022-05-18T05:10:53.4933351Z 2022-05-18T05:10:53.4933495Z Generating XML reports... 2022-05-18T05:10:53.4980759Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518051050.xml 2022-05-18T05:10:54.6726124Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T05:10:54.6741033Z 2022-05-18T05:10:54.6741432Z Running tests... 2022-05-18T05:10:54.6741957Z ---------------------------------------------------------------------- 2022-05-18T05:10:56.3225081Z test_invalid_powerSGD_state (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:10:56.3582368Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 75483 2022-05-18T05:10:56.3692379Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 75484 2022-05-18T05:10:57.2936608Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:10:57.2941974Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 0; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-05-18T05:10:57.2943967Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 0; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = False; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-05-18T05:10:57.2945239Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 0; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = False; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-05-18T05:10:57.2946328Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-05-18T05:10:57.2947394Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = False; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-05-18T05:10:57.2948452Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = False; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-05-18T05:10:57.3148068Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:10:57.3154760Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 0; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-05-18T05:10:57.3156207Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 0; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = False; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-05-18T05:10:57.3157264Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 0; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = False; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-05-18T05:10:57.3158332Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-05-18T05:10:57.3159428Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = True; warm_start = False; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-05-18T05:10:57.3160653Z INFO:torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook:PowerSGD config: matrix_approximation_rank = 1; start_powerSGD_iter = 1; min_compression_rate = 2; orthogonalization_epsilon = 0; use_error_feedback = False; warm_start = True; random_seed = 0; compression_stats_logging_frequency = 10000; batch_tensors_with_same_shape = False 2022-05-18T05:10:57.4736086Z ok (2.799s) 2022-05-18T05:10:57.4736312Z 2022-05-18T05:10:57.4736971Z ---------------------------------------------------------------------- 2022-05-18T05:10:57.4737322Z Ran 1 test in 2.799s 2022-05-18T05:10:57.4737493Z 2022-05-18T05:10:57.4737599Z OK 2022-05-18T05:10:57.4737735Z 2022-05-18T05:10:57.4737870Z Generating XML reports... 2022-05-18T05:10:57.4780738Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518051054.xml 2022-05-18T05:10:58.6564862Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T05:10:58.6580249Z 2022-05-18T05:10:58.6580699Z Running tests... 2022-05-18T05:10:58.6581222Z ---------------------------------------------------------------------- 2022-05-18T05:11:00.3255918Z test_save_load_checkpoint (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:11:00.3620627Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 75592 2022-05-18T05:11:00.3731307Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 75593 2022-05-18T05:11:01.3018653Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:11:01.3627113Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:11:01.3836890Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:11:01.3837419Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:11:01.3838243Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:11:01.3838951Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:11:02.7310501Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpy80_8ncz 2022-05-18T05:11:02.7311378Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpy80_8ncz/_remote_module_non_scriptable.py 2022-05-18T05:11:02.7403479Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp1pr6o360 2022-05-18T05:11:02.7406324Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp1pr6o360/_remote_module_non_scriptable.py 2022-05-18T05:11:03.0464311Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:11:03.0464869Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:11:03.0605474Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:11:03.0605970Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:11:03.3818282Z ok (4.723s) 2022-05-18T05:11:03.3818520Z 2022-05-18T05:11:03.3818934Z ---------------------------------------------------------------------- 2022-05-18T05:11:03.3819291Z Ran 1 test in 4.724s 2022-05-18T05:11:03.3819463Z 2022-05-18T05:11:03.3819562Z OK 2022-05-18T05:11:03.3819710Z 2022-05-18T05:11:03.3819851Z Generating XML reports... 2022-05-18T05:11:03.3865024Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518051058.xml 2022-05-18T05:11:04.5691425Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T05:11:04.5706563Z 2022-05-18T05:11:04.5706810Z Running tests... 2022-05-18T05:11:04.5707259Z ---------------------------------------------------------------------- 2022-05-18T05:11:06.2329192Z test_sparse_gradients (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:11:06.2691966Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 75713 2022-05-18T05:11:06.2801629Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 75714 2022-05-18T05:11:07.2103007Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:11:07.2281453Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:11:07.2520516Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpll96pmxn 2022-05-18T05:11:07.2522552Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpphf9pddf 2022-05-18T05:11:07.2523619Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpll96pmxn/_remote_module_non_scriptable.py 2022-05-18T05:11:07.2525230Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpphf9pddf/_remote_module_non_scriptable.py 2022-05-18T05:11:07.4846633Z ok (2.914s) 2022-05-18T05:11:07.4846872Z 2022-05-18T05:11:07.4847274Z ---------------------------------------------------------------------- 2022-05-18T05:11:07.4847603Z Ran 1 test in 2.914s 2022-05-18T05:11:07.4847772Z 2022-05-18T05:11:07.4847871Z OK 2022-05-18T05:11:07.4848008Z 2022-05-18T05:11:07.4848166Z Generating XML reports... 2022-05-18T05:11:07.4891883Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518051104.xml 2022-05-18T05:11:08.6690383Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T05:11:08.6705324Z 2022-05-18T05:11:08.6705559Z Running tests... 2022-05-18T05:11:08.6706018Z ---------------------------------------------------------------------- 2022-05-18T05:11:10.3251909Z test_sparse_gradients_grad_is_view (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:11:10.3614396Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 75862 2022-05-18T05:11:10.3724678Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 75863 2022-05-18T05:11:11.2595181Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:11:11.3026249Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:11:11.3315121Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpyztb56s9 2022-05-18T05:11:11.3315667Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpb4sh5p6_ 2022-05-18T05:11:11.3318228Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpyztb56s9/_remote_module_non_scriptable.py 2022-05-18T05:11:11.3319134Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpb4sh5p6_/_remote_module_non_scriptable.py 2022-05-18T05:11:11.5768779Z ok (2.906s) 2022-05-18T05:11:11.5769146Z 2022-05-18T05:11:11.5769816Z ---------------------------------------------------------------------- 2022-05-18T05:11:11.5770769Z Ran 1 test in 2.906s 2022-05-18T05:11:11.5771079Z 2022-05-18T05:11:11.5771256Z OK 2022-05-18T05:11:11.5771514Z 2022-05-18T05:11:11.5771780Z Generating XML reports... 2022-05-18T05:11:11.5817293Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518051108.xml 2022-05-18T05:11:12.7322855Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T05:11:12.7337674Z 2022-05-18T05:11:12.7337861Z Running tests... 2022-05-18T05:11:12.7338318Z ---------------------------------------------------------------------- 2022-05-18T05:11:14.3843055Z test_sync_batch_norm_empty_input (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:11:14.4197879Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 76011 2022-05-18T05:11:14.4307302Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 76012 2022-05-18T05:11:15.3566447Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:11:15.3903211Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:11:16.7285924Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp68bn7lsf 2022-05-18T05:11:16.7286755Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp68bn7lsf/_remote_module_non_scriptable.py 2022-05-18T05:11:16.7664975Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpsebev2x0 2022-05-18T05:11:16.7666891Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpsebev2x0/_remote_module_non_scriptable.py 2022-05-18T05:11:17.9326446Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:11:17.9327024Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:11:18.4416174Z ok (5.707s) 2022-05-18T05:11:18.4416406Z 2022-05-18T05:11:18.4416818Z ---------------------------------------------------------------------- 2022-05-18T05:11:18.4417189Z Ran 1 test in 5.708s 2022-05-18T05:11:18.4417364Z 2022-05-18T05:11:18.4417443Z OK 2022-05-18T05:11:18.4417587Z 2022-05-18T05:11:18.4417753Z Generating XML reports... 2022-05-18T05:11:18.4463124Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518051112.xml 2022-05-18T05:11:19.6371218Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T05:11:19.6387071Z 2022-05-18T05:11:19.6387304Z Running tests... 2022-05-18T05:11:19.6387750Z ---------------------------------------------------------------------- 2022-05-18T05:11:21.2986060Z test_sync_batch_norm_only_empty_input (__main__.DistributedDataParallelTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:11:21.3349318Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 76132 2022-05-18T05:11:21.3460868Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 76133 2022-05-18T05:11:22.2723058Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:11:22.2956858Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:11:23.6327160Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpjxo7irp2 2022-05-18T05:11:23.6327786Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpjxo7irp2/_remote_module_non_scriptable.py 2022-05-18T05:11:23.6638561Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpxo66v1t2 2022-05-18T05:11:23.6640993Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpxo66v1t2/_remote_module_non_scriptable.py 2022-05-18T05:11:24.1921418Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:11:24.1921943Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:11:24.5548960Z ok (4.916s) 2022-05-18T05:11:24.5549197Z 2022-05-18T05:11:24.5549618Z ---------------------------------------------------------------------- 2022-05-18T05:11:24.5549968Z Ran 1 test in 4.916s 2022-05-18T05:11:24.5550155Z 2022-05-18T05:11:24.5550242Z OK 2022-05-18T05:11:24.5550382Z 2022-05-18T05:11:24.5550524Z Generating XML reports... 2022-05-18T05:11:24.5594553Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518051119.xml 2022-05-18T05:11:25.7604700Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T05:11:25.7620148Z 2022-05-18T05:11:25.7620403Z Running tests... 2022-05-18T05:11:25.7620850Z ---------------------------------------------------------------------- 2022-05-18T05:11:27.4157226Z test_allgather_basics (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:11:27.4511590Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 76253 2022-05-18T05:11:27.4621279Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 76254 2022-05-18T05:11:27.4732650Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 76255 2022-05-18T05:11:27.4842180Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 76256 2022-05-18T05:11:28.3803751Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:11:28.4010099Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:11:28.4423992Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:11:28.4445424Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:11:28.6891257Z ok (2.927s) 2022-05-18T05:11:28.6891480Z 2022-05-18T05:11:28.6891891Z ---------------------------------------------------------------------- 2022-05-18T05:11:28.6892243Z Ran 1 test in 2.927s 2022-05-18T05:11:28.6892412Z 2022-05-18T05:11:28.6892509Z OK 2022-05-18T05:11:28.6892648Z 2022-05-18T05:11:28.6893593Z Generating XML reports... 2022-05-18T05:11:28.6949083Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518051125.xml 2022-05-18T05:11:29.8684159Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T05:11:29.8699504Z 2022-05-18T05:11:29.8699798Z Running tests... 2022-05-18T05:11:29.8700275Z ---------------------------------------------------------------------- 2022-05-18T05:11:31.5236859Z test_allgather_basics_cuda (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:11:31.5591014Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 76446 2022-05-18T05:11:31.5700484Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 76447 2022-05-18T05:11:31.5810829Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 76448 2022-05-18T05:11:31.5922008Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 76449 2022-05-18T05:11:32.5125149Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:11:32.5351547Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:11:32.5592475Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:11:32.5699743Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:11:34.6010897Z ok (4.731s) 2022-05-18T05:11:34.6011137Z 2022-05-18T05:11:34.6011550Z ---------------------------------------------------------------------- 2022-05-18T05:11:34.6011921Z Ran 1 test in 4.731s 2022-05-18T05:11:34.6012074Z 2022-05-18T05:11:34.6012172Z OK 2022-05-18T05:11:34.6012317Z 2022-05-18T05:11:34.6012457Z Generating XML reports... 2022-05-18T05:11:34.6069808Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518051129.xml 2022-05-18T05:11:35.7997758Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T05:11:35.8012137Z 2022-05-18T05:11:35.8012396Z Running tests... 2022-05-18T05:11:35.8012853Z ---------------------------------------------------------------------- 2022-05-18T05:11:37.4599641Z test_allgather_checks (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:11:37.4962498Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 76643 2022-05-18T05:11:37.5073447Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 76644 2022-05-18T05:11:37.5186121Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 76645 2022-05-18T05:11:37.5300379Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 76646 2022-05-18T05:11:38.4710929Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:11:38.5291930Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:11:38.5458329Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:11:38.5710414Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:11:38.8350791Z ok (3.034s) 2022-05-18T05:11:38.8350998Z 2022-05-18T05:11:38.8351408Z ---------------------------------------------------------------------- 2022-05-18T05:11:38.8351753Z Ran 1 test in 3.034s 2022-05-18T05:11:38.8351903Z 2022-05-18T05:11:38.8351998Z OK 2022-05-18T05:11:38.8352134Z 2022-05-18T05:11:38.8352266Z Generating XML reports... 2022-05-18T05:11:38.8407176Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518051135.xml 2022-05-18T05:11:40.0161074Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T05:11:40.0176158Z 2022-05-18T05:11:40.0176626Z Running tests... 2022-05-18T05:11:40.0177255Z ---------------------------------------------------------------------- 2022-05-18T05:11:41.6491833Z test_allgather_coalesced_async (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:11:41.6846618Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 76836 2022-05-18T05:11:41.6955688Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 76837 2022-05-18T05:11:41.7068662Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 76838 2022-05-18T05:11:41.7178726Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 76839 2022-05-18T05:11:42.6652353Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:11:42.6693333Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:11:42.6887464Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:11:42.6916836Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:11:42.7129999Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:11:42.7231990Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:11:42.7337130Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 3 2022-05-18T05:11:42.7337745Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 2 2022-05-18T05:11:42.7338533Z INFO:torch.distributed.distributed_c10d:Rank 3: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2022-05-18T05:11:42.7339245Z INFO:torch.distributed.distributed_c10d:Rank 2: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2022-05-18T05:11:42.7437149Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2022-05-18T05:11:42.7437866Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2022-05-18T05:11:43.0231076Z ok (3.005s) 2022-05-18T05:11:43.0231431Z 2022-05-18T05:11:43.0231830Z ---------------------------------------------------------------------- 2022-05-18T05:11:43.0232174Z Ran 1 test in 3.005s 2022-05-18T05:11:43.0232323Z 2022-05-18T05:11:43.0232419Z OK 2022-05-18T05:11:43.0232555Z 2022-05-18T05:11:43.0232687Z Generating XML reports... 2022-05-18T05:11:43.0287526Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518051140.xml 2022-05-18T05:11:44.1985855Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T05:11:44.2000684Z 2022-05-18T05:11:44.2000840Z Running tests... 2022-05-18T05:11:44.2001561Z ---------------------------------------------------------------------- 2022-05-18T05:11:45.8399995Z test_allgather_coalesced_checks (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:11:45.8760321Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 77029 2022-05-18T05:11:45.8871614Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 77030 2022-05-18T05:11:45.8984412Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 77031 2022-05-18T05:11:45.9096468Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 77032 2022-05-18T05:11:46.8332629Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:11:46.8391751Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:11:46.8737466Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:11:46.9047370Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:11:47.2147966Z ok (3.014s) 2022-05-18T05:11:47.2148214Z 2022-05-18T05:11:47.2148600Z ---------------------------------------------------------------------- 2022-05-18T05:11:47.2148968Z Ran 1 test in 3.015s 2022-05-18T05:11:47.2149139Z 2022-05-18T05:11:47.2149239Z OK 2022-05-18T05:11:47.2149380Z 2022-05-18T05:11:47.2149527Z Generating XML reports... 2022-05-18T05:11:47.2207014Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518051144.xml 2022-05-18T05:11:48.3910357Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T05:11:48.3925258Z 2022-05-18T05:11:48.3925471Z Running tests... 2022-05-18T05:11:48.3926346Z ---------------------------------------------------------------------- 2022-05-18T05:11:50.0399925Z test_allgather_noncontiguous_input (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:11:50.0755915Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 77222 2022-05-18T05:11:50.0865583Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 77223 2022-05-18T05:11:50.0977514Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 77224 2022-05-18T05:11:50.1089818Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 77225 2022-05-18T05:11:50.9990885Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:11:51.0057973Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:11:51.0533965Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:11:51.0593186Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:11:51.3138978Z ok (2.921s) 2022-05-18T05:11:51.3139420Z 2022-05-18T05:11:51.3140163Z ---------------------------------------------------------------------- 2022-05-18T05:11:51.3140504Z Ran 1 test in 2.921s 2022-05-18T05:11:51.3140690Z 2022-05-18T05:11:51.3140790Z OK 2022-05-18T05:11:51.3140929Z 2022-05-18T05:11:51.3141071Z Generating XML reports... 2022-05-18T05:11:51.3197093Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518051148.xml 2022-05-18T05:11:52.5060959Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T05:11:52.5076705Z 2022-05-18T05:11:52.5077121Z Running tests... 2022-05-18T05:11:52.5077604Z ---------------------------------------------------------------------- 2022-05-18T05:11:54.1669806Z test_allgather_stress (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:11:54.2033460Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 77415 2022-05-18T05:11:54.2145869Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 77416 2022-05-18T05:11:54.2257191Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 77417 2022-05-18T05:11:54.2370365Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 77418 2022-05-18T05:11:55.1410385Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:11:55.1521155Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:11:55.2012867Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:11:55.2061130Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:11:56.0432954Z ok (3.535s) 2022-05-18T05:11:56.0433156Z 2022-05-18T05:11:56.0433777Z ---------------------------------------------------------------------- 2022-05-18T05:11:56.0434127Z Ran 1 test in 3.536s 2022-05-18T05:11:56.0434296Z 2022-05-18T05:11:56.0434393Z OK 2022-05-18T05:11:56.0434529Z 2022-05-18T05:11:56.0434647Z Generating XML reports... 2022-05-18T05:11:56.0489411Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518051152.xml 2022-05-18T05:11:57.2261119Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T05:11:57.2275905Z 2022-05-18T05:11:57.2276182Z Running tests... 2022-05-18T05:11:57.2276646Z ---------------------------------------------------------------------- 2022-05-18T05:11:58.8269222Z test_allgather_stress_cuda (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:11:58.8624904Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 77632 2022-05-18T05:11:58.8737277Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 77633 2022-05-18T05:11:58.8846368Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 77634 2022-05-18T05:11:58.8956931Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 77635 2022-05-18T05:11:59.8646189Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:11:59.8658783Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:11:59.8675442Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:11:59.8693046Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:12:03.7105860Z ok (6.483s) 2022-05-18T05:12:03.7106209Z 2022-05-18T05:12:03.7106747Z ---------------------------------------------------------------------- 2022-05-18T05:12:03.7107113Z Ran 1 test in 6.483s 2022-05-18T05:12:03.7107286Z 2022-05-18T05:12:03.7107382Z OK 2022-05-18T05:12:03.7107521Z 2022-05-18T05:12:03.7107659Z Generating XML reports... 2022-05-18T05:12:03.7163242Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518051157.xml 2022-05-18T05:12:04.8974696Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T05:12:04.8989912Z 2022-05-18T05:12:04.8990061Z Running tests... 2022-05-18T05:12:04.8990491Z ---------------------------------------------------------------------- 2022-05-18T05:12:06.5180912Z test_allreduce_basics (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:12:06.5538092Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 77853 2022-05-18T05:12:06.5645025Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 77854 2022-05-18T05:12:06.5755062Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 77855 2022-05-18T05:12:06.5866738Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 77856 2022-05-18T05:12:07.4757976Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:12:07.4891514Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:12:07.5265190Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:12:07.5294964Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:12:07.7914270Z ok (2.892s) 2022-05-18T05:12:07.7914511Z 2022-05-18T05:12:07.7914908Z ---------------------------------------------------------------------- 2022-05-18T05:12:07.7915260Z Ran 1 test in 2.892s 2022-05-18T05:12:07.7915429Z 2022-05-18T05:12:07.7915508Z OK 2022-05-18T05:12:07.7915645Z 2022-05-18T05:12:07.7915779Z Generating XML reports... 2022-05-18T05:12:07.7971477Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518051204.xml 2022-05-18T05:12:08.9658077Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T05:12:08.9672985Z 2022-05-18T05:12:08.9673596Z Running tests... 2022-05-18T05:12:08.9674144Z ---------------------------------------------------------------------- 2022-05-18T05:12:10.6076875Z test_allreduce_basics_cuda (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:12:10.6439019Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 78046 2022-05-18T05:12:10.6549585Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 78047 2022-05-18T05:12:10.6662477Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 78048 2022-05-18T05:12:10.6773321Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 78049 2022-05-18T05:12:11.5791973Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:12:11.6038148Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:12:11.6270984Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:12:11.6467360Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:12:13.5864423Z ok (4.619s) 2022-05-18T05:12:13.5864687Z 2022-05-18T05:12:13.5865105Z ---------------------------------------------------------------------- 2022-05-18T05:12:13.5865456Z Ran 1 test in 4.619s 2022-05-18T05:12:13.5865721Z 2022-05-18T05:12:13.5865903Z OK 2022-05-18T05:12:13.5866116Z 2022-05-18T05:12:13.5866258Z Generating XML reports... 2022-05-18T05:12:13.5925689Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518051208.xml 2022-05-18T05:12:14.7500915Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T05:12:14.7515430Z 2022-05-18T05:12:14.7515829Z Running tests... 2022-05-18T05:12:14.7516522Z ---------------------------------------------------------------------- 2022-05-18T05:12:16.3995250Z test_allreduce_basics_cuda_using_work_api (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:12:16.4359461Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 78243 2022-05-18T05:12:16.4470358Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 78244 2022-05-18T05:12:16.4582712Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 78245 2022-05-18T05:12:16.4694725Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 78246 2022-05-18T05:12:17.3877143Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:12:17.4278357Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:12:17.4285318Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:12:17.4482665Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:12:19.3784312Z ok (4.627s) 2022-05-18T05:12:19.3784544Z 2022-05-18T05:12:19.3785274Z ---------------------------------------------------------------------- 2022-05-18T05:12:19.3785661Z Ran 1 test in 4.627s 2022-05-18T05:12:19.3785810Z 2022-05-18T05:12:19.3785914Z OK 2022-05-18T05:12:19.3786052Z 2022-05-18T05:12:19.3786191Z Generating XML reports... 2022-05-18T05:12:19.3840087Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518051214.xml 2022-05-18T05:12:20.5554263Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T05:12:20.5568932Z 2022-05-18T05:12:20.5569080Z Running tests... 2022-05-18T05:12:20.5569780Z ---------------------------------------------------------------------- 2022-05-18T05:12:22.1809830Z test_allreduce_basics_using_work_api (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:12:22.2166850Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 78440 2022-05-18T05:12:22.2278439Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 78441 2022-05-18T05:12:22.2389162Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 78442 2022-05-18T05:12:22.2499406Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 78443 2022-05-18T05:12:23.1824904Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:12:23.2369253Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:12:23.2977435Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:12:23.2988807Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:12:23.5551611Z ok (2.998s) 2022-05-18T05:12:23.5551992Z 2022-05-18T05:12:23.5552443Z ---------------------------------------------------------------------- 2022-05-18T05:12:23.5552801Z Ran 1 test in 2.998s 2022-05-18T05:12:23.5552967Z 2022-05-18T05:12:23.5553064Z OK 2022-05-18T05:12:23.5553210Z 2022-05-18T05:12:23.5553370Z Generating XML reports... 2022-05-18T05:12:23.5608160Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518051220.xml 2022-05-18T05:12:24.7321732Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T05:12:24.7337429Z 2022-05-18T05:12:24.7337685Z Running tests... 2022-05-18T05:12:24.7338137Z ---------------------------------------------------------------------- 2022-05-18T05:12:26.4030170Z test_allreduce_checks (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:12:26.4396147Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 78633 2022-05-18T05:12:26.4507786Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 78634 2022-05-18T05:12:26.4620879Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 78635 2022-05-18T05:12:26.4732662Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 78636 2022-05-18T05:12:27.3656487Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:12:27.3668487Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:12:27.3715513Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:12:27.4279457Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:12:27.6782394Z ok (2.944s) 2022-05-18T05:12:27.6782633Z 2022-05-18T05:12:27.6783339Z ---------------------------------------------------------------------- 2022-05-18T05:12:27.6783688Z Ran 1 test in 2.944s 2022-05-18T05:12:27.6783853Z 2022-05-18T05:12:27.6783948Z OK 2022-05-18T05:12:27.6784093Z 2022-05-18T05:12:27.6784231Z Generating XML reports... 2022-05-18T05:12:27.6838576Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518051224.xml 2022-05-18T05:12:28.8558998Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T05:12:28.8573777Z 2022-05-18T05:12:28.8574026Z Running tests... 2022-05-18T05:12:28.8574484Z ---------------------------------------------------------------------- 2022-05-18T05:12:30.4921556Z test_allreduce_coalesced_async (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:12:30.5286760Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 78826 2022-05-18T05:12:30.5399377Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 78827 2022-05-18T05:12:30.5514737Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 78828 2022-05-18T05:12:30.5631573Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 78829 2022-05-18T05:12:31.4455202Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:12:31.4581739Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:12:31.4925014Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:12:31.5121189Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:12:31.5300456Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:12:31.5403683Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:12:31.5404216Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 2 2022-05-18T05:12:31.5404693Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 3 2022-05-18T05:12:31.5405481Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2022-05-18T05:12:31.5406191Z INFO:torch.distributed.distributed_c10d:Rank 2: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2022-05-18T05:12:31.5406880Z INFO:torch.distributed.distributed_c10d:Rank 3: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2022-05-18T05:12:31.5506626Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2022-05-18T05:12:31.7678830Z ok (2.910s) 2022-05-18T05:12:31.7679052Z 2022-05-18T05:12:31.7679438Z ---------------------------------------------------------------------- 2022-05-18T05:12:31.7679805Z Ran 1 test in 2.910s 2022-05-18T05:12:31.7679986Z 2022-05-18T05:12:31.7680084Z OK 2022-05-18T05:12:31.7680222Z 2022-05-18T05:12:31.7680357Z Generating XML reports... 2022-05-18T05:12:31.7734837Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518051228.xml 2022-05-18T05:12:32.9316702Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T05:12:32.9330616Z 2022-05-18T05:12:32.9331101Z Running tests... 2022-05-18T05:12:32.9331608Z ---------------------------------------------------------------------- 2022-05-18T05:12:34.5311144Z test_allreduce_coalesced_basics (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:12:34.5665107Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 79019 2022-05-18T05:12:34.5777825Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 79020 2022-05-18T05:12:34.5886686Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 79021 2022-05-18T05:12:34.5996005Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 79022 2022-05-18T05:12:35.4875786Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:12:35.4972994Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:12:35.4992014Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:12:35.5293820Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:12:35.8043721Z ok (2.871s) 2022-05-18T05:12:35.8043960Z 2022-05-18T05:12:35.8044345Z ---------------------------------------------------------------------- 2022-05-18T05:12:35.8044698Z Ran 1 test in 2.871s 2022-05-18T05:12:35.8044866Z 2022-05-18T05:12:35.8044963Z OK 2022-05-18T05:12:35.8045104Z 2022-05-18T05:12:35.8045243Z Generating XML reports... 2022-05-18T05:12:35.8100490Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518051232.xml 2022-05-18T05:12:36.9831988Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T05:12:36.9847484Z 2022-05-18T05:12:36.9847732Z Running tests... 2022-05-18T05:12:36.9848176Z ---------------------------------------------------------------------- 2022-05-18T05:12:38.6440527Z test_allreduce_coalesced_checks (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:12:38.6803935Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 79212 2022-05-18T05:12:38.6916689Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 79213 2022-05-18T05:12:38.7029955Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 79214 2022-05-18T05:12:38.7141719Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 79215 2022-05-18T05:12:39.6305364Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:12:39.6499569Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:12:39.6518611Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:12:39.6585171Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:12:39.9189615Z ok (2.934s) 2022-05-18T05:12:39.9189880Z 2022-05-18T05:12:39.9190281Z ---------------------------------------------------------------------- 2022-05-18T05:12:39.9190631Z Ran 1 test in 2.934s 2022-05-18T05:12:39.9190798Z 2022-05-18T05:12:39.9191001Z OK 2022-05-18T05:12:39.9191118Z 2022-05-18T05:12:39.9191255Z Generating XML reports... 2022-05-18T05:12:39.9247599Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518051236.xml 2022-05-18T05:12:41.0894670Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T05:12:41.0910695Z 2022-05-18T05:12:41.0911132Z Running tests... 2022-05-18T05:12:41.0911657Z ---------------------------------------------------------------------- 2022-05-18T05:12:42.7442969Z test_allreduce_coalesced_checks_cuda (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:12:42.7797048Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 79405 2022-05-18T05:12:42.7906742Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 79406 2022-05-18T05:12:42.8018664Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 79407 2022-05-18T05:12:42.8128540Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 79408 2022-05-18T05:12:43.7152382Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:12:43.7153204Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:12:43.7178179Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:12:43.7388879Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:12:45.6218682Z ok (4.530s) 2022-05-18T05:12:45.6218902Z 2022-05-18T05:12:45.6219807Z ---------------------------------------------------------------------- 2022-05-18T05:12:45.6220209Z Ran 1 test in 4.531s 2022-05-18T05:12:45.6220383Z 2022-05-18T05:12:45.6220461Z OK 2022-05-18T05:12:45.6220601Z 2022-05-18T05:12:45.6220738Z Generating XML reports... 2022-05-18T05:12:45.6277775Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518051241.xml 2022-05-18T05:12:46.8191613Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T05:12:46.8206653Z 2022-05-18T05:12:46.8207053Z Running tests... 2022-05-18T05:12:46.8207560Z ---------------------------------------------------------------------- 2022-05-18T05:12:48.4690272Z test_allreduce_coalesced_stress (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:12:48.5055429Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 79602 2022-05-18T05:12:48.5165844Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 79603 2022-05-18T05:12:48.5278561Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 79604 2022-05-18T05:12:48.5390020Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 79605 2022-05-18T05:12:49.4428176Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:12:49.4649134Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:12:49.4893933Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:12:49.4950684Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:12:50.1446502Z ok (3.324s) 2022-05-18T05:12:50.1446792Z 2022-05-18T05:12:50.1447352Z ---------------------------------------------------------------------- 2022-05-18T05:12:50.1447705Z Ran 1 test in 3.324s 2022-05-18T05:12:50.1447874Z 2022-05-18T05:12:50.1447979Z OK 2022-05-18T05:12:50.1448099Z 2022-05-18T05:12:50.1448250Z Generating XML reports... 2022-05-18T05:12:50.1510658Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518051246.xml 2022-05-18T05:12:51.3249304Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T05:12:51.3263492Z 2022-05-18T05:12:51.3263739Z Running tests... 2022-05-18T05:12:51.3264171Z ---------------------------------------------------------------------- 2022-05-18T05:12:52.9246657Z test_allreduce_stress (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:12:52.9600406Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 79819 2022-05-18T05:12:52.9710459Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 79820 2022-05-18T05:12:52.9818181Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 79821 2022-05-18T05:12:52.9927683Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 79822 2022-05-18T05:12:53.8873707Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:12:53.8891323Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:12:53.8950202Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:12:53.8957116Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:12:54.3983667Z ok (3.072s) 2022-05-18T05:12:54.3984232Z 2022-05-18T05:12:54.3984630Z ---------------------------------------------------------------------- 2022-05-18T05:12:54.3984962Z Ran 1 test in 3.072s 2022-05-18T05:12:54.3985131Z 2022-05-18T05:12:54.3985228Z OK 2022-05-18T05:12:54.3985365Z 2022-05-18T05:12:54.3985508Z Generating XML reports... 2022-05-18T05:12:54.4039819Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518051251.xml 2022-05-18T05:12:55.5876956Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T05:12:55.5892085Z 2022-05-18T05:12:55.5892324Z Running tests... 2022-05-18T05:12:55.5892779Z ---------------------------------------------------------------------- 2022-05-18T05:12:57.2370323Z test_allreduce_stress_cuda (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:12:57.2723617Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 80036 2022-05-18T05:12:57.2832948Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 80037 2022-05-18T05:12:57.2943699Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 80038 2022-05-18T05:12:57.3054046Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 80039 2022-05-18T05:12:58.2087956Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:12:58.2108098Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:12:58.2237431Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:12:58.2485589Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:13:00.7152661Z ok (5.126s) 2022-05-18T05:13:00.7152934Z 2022-05-18T05:13:00.7153348Z ---------------------------------------------------------------------- 2022-05-18T05:13:00.7153708Z Ran 1 test in 5.126s 2022-05-18T05:13:00.7153887Z 2022-05-18T05:13:00.7154007Z OK 2022-05-18T05:13:00.7154147Z 2022-05-18T05:13:00.7154266Z Generating XML reports... 2022-05-18T05:13:00.7210204Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518051255.xml 2022-05-18T05:13:01.9158516Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T05:13:01.9175092Z 2022-05-18T05:13:01.9175520Z Running tests... 2022-05-18T05:13:01.9176034Z ---------------------------------------------------------------------- 2022-05-18T05:13:03.5667720Z test_barrier_implies_wait (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:13:03.6032125Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 80257 2022-05-18T05:13:03.6144063Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 80258 2022-05-18T05:13:03.6254628Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 80259 2022-05-18T05:13:03.6366538Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 80260 2022-05-18T05:13:04.5599170Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:13:04.6132734Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:13:04.6312856Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:13:04.6646980Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:13:04.9417955Z ok (3.024s) 2022-05-18T05:13:04.9418369Z 2022-05-18T05:13:04.9418955Z ---------------------------------------------------------------------- 2022-05-18T05:13:04.9419317Z Ran 1 test in 3.024s 2022-05-18T05:13:04.9419491Z 2022-05-18T05:13:04.9419587Z OK 2022-05-18T05:13:04.9419707Z 2022-05-18T05:13:04.9422437Z Generating XML reports... 2022-05-18T05:13:04.9473988Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518051301.xml 2022-05-18T05:13:06.1158121Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T05:13:06.1173904Z 2022-05-18T05:13:06.1174283Z Running tests... 2022-05-18T05:13:06.1174784Z ---------------------------------------------------------------------- 2022-05-18T05:13:07.7656487Z test_broadcast_basics (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:13:07.8018904Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 80450 2022-05-18T05:13:07.8130721Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 80451 2022-05-18T05:13:07.8242960Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 80452 2022-05-18T05:13:07.8355424Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 80453 2022-05-18T05:13:08.7355683Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:13:08.7454922Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:13:08.7760887Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:13:08.7902362Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:13:09.0403503Z ok (2.923s) 2022-05-18T05:13:09.0403881Z 2022-05-18T05:13:09.0404586Z ---------------------------------------------------------------------- 2022-05-18T05:13:09.0405217Z Ran 1 test in 2.923s 2022-05-18T05:13:09.0405531Z 2022-05-18T05:13:09.0405700Z OK 2022-05-18T05:13:09.0405964Z 2022-05-18T05:13:09.0406172Z Generating XML reports... 2022-05-18T05:13:09.0464404Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518051306.xml 2022-05-18T05:13:10.2204657Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T05:13:10.2220586Z 2022-05-18T05:13:10.2220823Z Running tests... 2022-05-18T05:13:10.2221271Z ---------------------------------------------------------------------- 2022-05-18T05:13:11.8715033Z test_broadcast_basics_cuda (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:13:11.9079248Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 80643 2022-05-18T05:13:11.9189842Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 80644 2022-05-18T05:13:11.9302305Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 80645 2022-05-18T05:13:11.9414506Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 80646 2022-05-18T05:13:12.8899384Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:13:12.9027292Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:13:12.9179181Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:13:12.9717354Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:13:14.8502713Z ok (4.628s) 2022-05-18T05:13:14.8503377Z 2022-05-18T05:13:14.8503975Z ---------------------------------------------------------------------- 2022-05-18T05:13:14.8504612Z Ran 1 test in 4.628s 2022-05-18T05:13:14.8504832Z 2022-05-18T05:13:14.8504948Z OK 2022-05-18T05:13:14.8505089Z 2022-05-18T05:13:14.8505229Z Generating XML reports... 2022-05-18T05:13:14.8563800Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518051310.xml 2022-05-18T05:13:16.0749905Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T05:13:16.0764194Z 2022-05-18T05:13:16.0764476Z Running tests... 2022-05-18T05:13:16.0764916Z ---------------------------------------------------------------------- 2022-05-18T05:13:17.7679279Z test_broadcast_checks (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:13:17.8046671Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 80840 2022-05-18T05:13:17.8159993Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 80841 2022-05-18T05:13:17.8276147Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 80842 2022-05-18T05:13:17.8391474Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 80843 2022-05-18T05:13:18.7667671Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:13:18.7963446Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:13:18.8104912Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:13:18.8189284Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:13:19.1441708Z ok (3.067s) 2022-05-18T05:13:19.1442065Z 2022-05-18T05:13:19.1442759Z ---------------------------------------------------------------------- 2022-05-18T05:13:19.1443407Z Ran 1 test in 3.068s 2022-05-18T05:13:19.1443724Z 2022-05-18T05:13:19.1443910Z OK 2022-05-18T05:13:19.1444149Z 2022-05-18T05:13:19.1444403Z Generating XML reports... 2022-05-18T05:13:19.1502143Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518051316.xml 2022-05-18T05:13:20.3418839Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T05:13:20.3434818Z 2022-05-18T05:13:20.3435330Z Running tests... 2022-05-18T05:13:20.3435846Z ---------------------------------------------------------------------- 2022-05-18T05:13:21.9629197Z test_broadcast_stress (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:13:21.9988040Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 81033 2022-05-18T05:13:22.0096219Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 81034 2022-05-18T05:13:22.0208393Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 81035 2022-05-18T05:13:22.0321867Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 81036 2022-05-18T05:13:22.9371499Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:13:22.9546798Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:13:22.9772663Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:13:22.9898800Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:13:23.4374461Z ok (3.094s) 2022-05-18T05:13:23.4374698Z 2022-05-18T05:13:23.4375097Z ---------------------------------------------------------------------- 2022-05-18T05:13:23.4375450Z Ran 1 test in 3.094s 2022-05-18T05:13:23.4375620Z 2022-05-18T05:13:23.4375716Z OK 2022-05-18T05:13:23.4375854Z 2022-05-18T05:13:23.4375991Z Generating XML reports... 2022-05-18T05:13:23.4431192Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518051320.xml 2022-05-18T05:13:24.6228004Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T05:13:24.6242534Z 2022-05-18T05:13:24.6243009Z Running tests... 2022-05-18T05:13:24.6243531Z ---------------------------------------------------------------------- 2022-05-18T05:13:26.2466884Z test_broadcast_stress_cuda (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:13:26.2830850Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 81250 2022-05-18T05:13:26.2943186Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 81251 2022-05-18T05:13:26.3055620Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 81252 2022-05-18T05:13:26.3167458Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 81253 2022-05-18T05:13:27.2322035Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:13:27.2855784Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:13:27.2856626Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:13:27.3153137Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:13:29.6277303Z ok (5.003s) 2022-05-18T05:13:29.6277559Z 2022-05-18T05:13:29.6277987Z ---------------------------------------------------------------------- 2022-05-18T05:13:29.6278339Z Ran 1 test in 5.003s 2022-05-18T05:13:29.6278512Z 2022-05-18T05:13:29.6278611Z OK 2022-05-18T05:13:29.6278752Z 2022-05-18T05:13:29.6278870Z Generating XML reports... 2022-05-18T05:13:29.6336071Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518051324.xml 2022-05-18T05:13:30.8374634Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T05:13:30.8390996Z 2022-05-18T05:13:30.8391455Z Running tests... 2022-05-18T05:13:30.8391957Z ---------------------------------------------------------------------- 2022-05-18T05:13:32.5048145Z test_empty_tensors (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:13:32.5412096Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 81471 2022-05-18T05:13:32.5522793Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 81472 2022-05-18T05:13:32.5634344Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 81473 2022-05-18T05:13:32.5749445Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 81474 2022-05-18T05:13:33.4884266Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:13:33.5157604Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:13:33.5158693Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:13:33.5388942Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:13:33.7798167Z ok (2.940s) 2022-05-18T05:13:33.7798386Z 2022-05-18T05:13:33.7798870Z ---------------------------------------------------------------------- 2022-05-18T05:13:33.7799397Z Ran 1 test in 2.941s 2022-05-18T05:13:33.7799572Z 2022-05-18T05:13:33.7799648Z OK 2022-05-18T05:13:33.7799798Z 2022-05-18T05:13:33.7799939Z Generating XML reports... 2022-05-18T05:13:33.7854925Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518051330.xml 2022-05-18T05:13:34.9481693Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T05:13:34.9498259Z 2022-05-18T05:13:34.9498528Z Running tests... 2022-05-18T05:13:34.9499140Z ---------------------------------------------------------------------- 2022-05-18T05:13:36.6192753Z test_gather_basics (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:13:36.6560560Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 81664 2022-05-18T05:13:36.6672160Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 81665 2022-05-18T05:13:36.6785226Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 81666 2022-05-18T05:13:36.6900559Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 81667 2022-05-18T05:13:37.5869111Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:13:37.5950832Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:13:37.6011155Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:13:37.6342267Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:13:37.8947863Z ok (2.945s) 2022-05-18T05:13:37.8948104Z 2022-05-18T05:13:37.8948509Z ---------------------------------------------------------------------- 2022-05-18T05:13:37.8949128Z Ran 1 test in 2.945s 2022-05-18T05:13:37.8949322Z 2022-05-18T05:13:37.8949434Z OK 2022-05-18T05:13:37.8949574Z 2022-05-18T05:13:37.8949715Z Generating XML reports... 2022-05-18T05:13:37.9006461Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518051334.xml 2022-05-18T05:13:39.0578265Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T05:13:39.0592767Z 2022-05-18T05:13:39.0593080Z Running tests... 2022-05-18T05:13:39.0593540Z ---------------------------------------------------------------------- 2022-05-18T05:13:40.6670974Z test_gather_basics_cuda (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:13:40.7029633Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 81857 2022-05-18T05:13:40.7140775Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 81858 2022-05-18T05:13:40.7252665Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 81859 2022-05-18T05:13:40.7365593Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 81860 2022-05-18T05:13:41.6289225Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:13:41.6292236Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:13:41.6351118Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:13:41.6400443Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:13:43.5451169Z ok (4.485s) 2022-05-18T05:13:43.5451409Z 2022-05-18T05:13:43.5451814Z ---------------------------------------------------------------------- 2022-05-18T05:13:43.5452166Z Ran 1 test in 4.486s 2022-05-18T05:13:43.5452316Z 2022-05-18T05:13:43.5452413Z OK 2022-05-18T05:13:43.5452551Z 2022-05-18T05:13:43.5452685Z Generating XML reports... 2022-05-18T05:13:43.5507886Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518051339.xml 2022-05-18T05:13:44.7247351Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T05:13:44.7262008Z 2022-05-18T05:13:44.7262434Z Running tests... 2022-05-18T05:13:44.7262950Z ---------------------------------------------------------------------- 2022-05-18T05:13:46.3433006Z test_gather_checks (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:13:46.3787758Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 82054 2022-05-18T05:13:46.3896709Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 82055 2022-05-18T05:13:46.4008491Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 82056 2022-05-18T05:13:46.4118031Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 82057 2022-05-18T05:13:47.3071908Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:13:47.3121994Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:13:47.3135242Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:13:47.3169111Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:13:47.6167214Z ok (2.890s) 2022-05-18T05:13:47.6167451Z 2022-05-18T05:13:47.6168153Z ---------------------------------------------------------------------- 2022-05-18T05:13:47.6169088Z Ran 1 test in 2.891s 2022-05-18T05:13:47.6169269Z 2022-05-18T05:13:47.6169367Z OK 2022-05-18T05:13:47.6169507Z 2022-05-18T05:13:47.6169637Z Generating XML reports... 2022-05-18T05:13:47.6225744Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518051344.xml 2022-05-18T05:13:48.8083503Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T05:13:48.8099492Z 2022-05-18T05:13:48.8099741Z Running tests... 2022-05-18T05:13:48.8100188Z ---------------------------------------------------------------------- 2022-05-18T05:13:50.4547123Z test_gather_noncontiguous_input (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:13:50.4911055Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 82247 2022-05-18T05:13:50.5022631Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 82248 2022-05-18T05:13:50.5134058Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 82249 2022-05-18T05:13:50.5246539Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 82250 2022-05-18T05:13:51.4199089Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:13:51.4240962Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:13:51.4257159Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:13:51.4669536Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:13:51.7294342Z ok (2.919s) 2022-05-18T05:13:51.7294599Z 2022-05-18T05:13:51.7295015Z ---------------------------------------------------------------------- 2022-05-18T05:13:51.7295363Z Ran 1 test in 2.919s 2022-05-18T05:13:51.7295534Z 2022-05-18T05:13:51.7295641Z OK 2022-05-18T05:13:51.7295761Z 2022-05-18T05:13:51.7295921Z Generating XML reports... 2022-05-18T05:13:51.7351820Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518051348.xml 2022-05-18T05:13:52.9056293Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T05:13:52.9073435Z 2022-05-18T05:13:52.9073772Z Running tests... 2022-05-18T05:13:52.9074234Z ---------------------------------------------------------------------- 2022-05-18T05:13:54.5511377Z test_gather_stress (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:13:54.5875155Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 82440 2022-05-18T05:13:54.5987909Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 82441 2022-05-18T05:13:54.6100915Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 82442 2022-05-18T05:13:54.6217061Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 82443 2022-05-18T05:13:55.5082836Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:13:55.5142740Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:13:55.5522447Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:13:55.5756791Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:13:56.5281981Z ok (3.620s) 2022-05-18T05:13:56.5282212Z 2022-05-18T05:13:56.5282786Z ---------------------------------------------------------------------- 2022-05-18T05:13:56.5283199Z Ran 1 test in 3.621s 2022-05-18T05:13:56.5283370Z 2022-05-18T05:13:56.5283456Z OK 2022-05-18T05:13:56.5283601Z 2022-05-18T05:13:56.5283740Z Generating XML reports... 2022-05-18T05:13:56.5340754Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518051352.xml 2022-05-18T05:13:57.7212868Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T05:13:57.7228977Z 2022-05-18T05:13:57.7229216Z Running tests... 2022-05-18T05:13:57.7229666Z ---------------------------------------------------------------------- 2022-05-18T05:13:59.3388139Z test_gather_stress_cuda (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:13:59.3751372Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 82657 2022-05-18T05:13:59.3860452Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 82658 2022-05-18T05:13:59.3972463Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 82659 2022-05-18T05:13:59.4084375Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 82660 2022-05-18T05:14:00.3737749Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:14:00.3864844Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:14:00.4195030Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:14:00.5218318Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:14:04.1210344Z ok (6.398s) 2022-05-18T05:14:04.1210570Z 2022-05-18T05:14:04.1210998Z ---------------------------------------------------------------------- 2022-05-18T05:14:04.1211600Z Ran 1 test in 6.398s 2022-05-18T05:14:04.1211753Z 2022-05-18T05:14:04.1211863Z OK 2022-05-18T05:14:04.1212001Z 2022-05-18T05:14:04.1212142Z Generating XML reports... 2022-05-18T05:14:04.1268159Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518051357.xml 2022-05-18T05:14:05.3231112Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T05:14:05.3246438Z 2022-05-18T05:14:05.3246852Z Running tests... 2022-05-18T05:14:05.3247483Z ---------------------------------------------------------------------- 2022-05-18T05:14:06.9591878Z test_multi_device_constructor (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:14:06.9956485Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 82878 2022-05-18T05:14:07.0067999Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 82879 2022-05-18T05:14:07.0179351Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 82880 2022-05-18T05:14:07.0291425Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 82881 2022-05-18T05:14:07.9070303Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:14:07.9180522Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:14:07.9184316Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:14:07.9697194Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:14:08.3341675Z ok (3.009s) 2022-05-18T05:14:08.3341876Z 2022-05-18T05:14:08.3342524Z ---------------------------------------------------------------------- 2022-05-18T05:14:08.3342886Z Ran 1 test in 3.009s 2022-05-18T05:14:08.3343060Z 2022-05-18T05:14:08.3343156Z OK 2022-05-18T05:14:08.3343294Z 2022-05-18T05:14:08.3343436Z Generating XML reports... 2022-05-18T05:14:08.3399086Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518051405.xml 2022-05-18T05:14:09.5173287Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T05:14:09.5188758Z 2022-05-18T05:14:09.5188999Z Running tests... 2022-05-18T05:14:09.5189427Z ---------------------------------------------------------------------- 2022-05-18T05:14:11.1703090Z test_reduce_basics (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:14:11.2067506Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 83075 2022-05-18T05:14:11.2178692Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 83076 2022-05-18T05:14:11.2293049Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 83077 2022-05-18T05:14:11.2406784Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 83078 2022-05-18T05:14:12.1458267Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:14:12.1460109Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:14:12.1468940Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:14:12.1578052Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:14:12.4454723Z ok (2.926s) 2022-05-18T05:14:12.4454951Z 2022-05-18T05:14:12.4455367Z ---------------------------------------------------------------------- 2022-05-18T05:14:12.4455718Z Ran 1 test in 2.927s 2022-05-18T05:14:12.4455892Z 2022-05-18T05:14:12.4455969Z OK 2022-05-18T05:14:12.4456106Z 2022-05-18T05:14:12.4456241Z Generating XML reports... 2022-05-18T05:14:12.4513331Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518051409.xml 2022-05-18T05:14:13.6318639Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T05:14:13.6335192Z 2022-05-18T05:14:13.6335590Z Running tests... 2022-05-18T05:14:13.6336076Z ---------------------------------------------------------------------- 2022-05-18T05:14:15.2746748Z test_reduce_basics_cuda (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:14:15.3099708Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 83268 2022-05-18T05:14:15.3209300Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 83269 2022-05-18T05:14:15.3318429Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 83270 2022-05-18T05:14:15.3429926Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 83271 2022-05-18T05:14:16.2436416Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:14:16.2438973Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:14:16.2495126Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:14:16.2563955Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:14:18.2517338Z ok (4.618s) 2022-05-18T05:14:18.2517583Z 2022-05-18T05:14:18.2517962Z ---------------------------------------------------------------------- 2022-05-18T05:14:18.2518308Z Ran 1 test in 4.618s 2022-05-18T05:14:18.2518477Z 2022-05-18T05:14:18.2518574Z OK 2022-05-18T05:14:18.2518744Z 2022-05-18T05:14:18.2518880Z Generating XML reports... 2022-05-18T05:14:18.2577657Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518051413.xml 2022-05-18T05:14:19.4399870Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T05:14:19.4413877Z 2022-05-18T05:14:19.4414118Z Running tests... 2022-05-18T05:14:19.4414580Z ---------------------------------------------------------------------- 2022-05-18T05:14:21.0426808Z test_reduce_checks (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:14:21.0786494Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 83465 2022-05-18T05:14:21.0899858Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 83466 2022-05-18T05:14:21.1009341Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 83467 2022-05-18T05:14:21.1121858Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 83468 2022-05-18T05:14:22.0325731Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:14:22.0374900Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:14:22.0808492Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:14:22.0809121Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:14:22.3168529Z ok (2.875s) 2022-05-18T05:14:22.3168811Z 2022-05-18T05:14:22.3169428Z ---------------------------------------------------------------------- 2022-05-18T05:14:22.3169796Z Ran 1 test in 2.875s 2022-05-18T05:14:22.3169948Z 2022-05-18T05:14:22.3170043Z OK 2022-05-18T05:14:22.3170182Z 2022-05-18T05:14:22.3173040Z Generating XML reports... 2022-05-18T05:14:22.3226156Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518051419.xml 2022-05-18T05:14:23.4959394Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T05:14:23.4974933Z 2022-05-18T05:14:23.4975367Z Running tests... 2022-05-18T05:14:23.4976022Z ---------------------------------------------------------------------- 2022-05-18T05:14:25.1325034Z test_reduce_stress (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:14:25.1689373Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 83658 2022-05-18T05:14:25.1801518Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 83659 2022-05-18T05:14:25.1913696Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 83660 2022-05-18T05:14:25.2028212Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 83661 2022-05-18T05:14:26.0898860Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:14:26.1071685Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:14:26.1084055Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:14:26.1123526Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:14:26.8087179Z ok (3.311s) 2022-05-18T05:14:26.8087404Z 2022-05-18T05:14:26.8087828Z ---------------------------------------------------------------------- 2022-05-18T05:14:26.8088161Z Ran 1 test in 3.311s 2022-05-18T05:14:26.8088332Z 2022-05-18T05:14:26.8088431Z OK 2022-05-18T05:14:26.8088578Z 2022-05-18T05:14:26.8088714Z Generating XML reports... 2022-05-18T05:14:26.8145285Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518051423.xml 2022-05-18T05:14:28.0079690Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T05:14:28.0095302Z 2022-05-18T05:14:28.0095585Z Running tests... 2022-05-18T05:14:28.0096225Z ---------------------------------------------------------------------- 2022-05-18T05:14:29.6641107Z test_reduce_stress_cuda (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:14:29.7002730Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 83875 2022-05-18T05:14:29.7113926Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 83876 2022-05-18T05:14:29.7223785Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 83877 2022-05-18T05:14:29.7335554Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 83878 2022-05-18T05:14:30.6092238Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:14:30.6427155Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:14:30.6450844Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:14:30.6956235Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:14:33.4441698Z ok (5.434s) 2022-05-18T05:14:33.4442057Z 2022-05-18T05:14:33.4442467Z ---------------------------------------------------------------------- 2022-05-18T05:14:33.4442797Z Ran 1 test in 5.435s 2022-05-18T05:14:33.4442967Z 2022-05-18T05:14:33.4443063Z OK 2022-05-18T05:14:33.4443571Z 2022-05-18T05:14:33.4443824Z Generating XML reports... 2022-05-18T05:14:33.4498370Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518051428.xml 2022-05-18T05:14:34.5995842Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T05:14:34.6010814Z 2022-05-18T05:14:34.6011057Z Running tests... 2022-05-18T05:14:34.6011492Z ---------------------------------------------------------------------- 2022-05-18T05:14:36.2606107Z test_round_robin (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:14:36.2969408Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 84096 2022-05-18T05:14:36.3080935Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 84097 2022-05-18T05:14:36.3193386Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 84098 2022-05-18T05:14:36.3308766Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 84099 2022-05-18T05:14:37.2164591Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:14:37.2225224Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:14:37.2559652Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:14:37.2611546Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:14:37.6361055Z ok (3.035s) 2022-05-18T05:14:37.6361261Z 2022-05-18T05:14:37.6362092Z ---------------------------------------------------------------------- 2022-05-18T05:14:37.6362488Z Ran 1 test in 3.035s 2022-05-18T05:14:37.6362664Z 2022-05-18T05:14:37.6362773Z OK 2022-05-18T05:14:37.6362892Z 2022-05-18T05:14:37.6363029Z Generating XML reports... 2022-05-18T05:14:37.6421236Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518051434.xml 2022-05-18T05:14:38.8201471Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T05:14:38.8216407Z 2022-05-18T05:14:38.8216645Z Running tests... 2022-05-18T05:14:38.8217098Z ---------------------------------------------------------------------- 2022-05-18T05:14:40.4831517Z test_round_robin_create_destroy (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:14:40.5195265Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 84301 2022-05-18T05:14:40.5308377Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 84302 2022-05-18T05:14:40.5421870Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 84303 2022-05-18T05:14:40.5534087Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 84304 2022-05-18T05:14:41.4672187Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:14:41.4987627Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:14:41.5039882Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:14:41.5342422Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:14:42.0589866Z ok (3.237s) 2022-05-18T05:14:42.0590150Z 2022-05-18T05:14:42.0590584Z ---------------------------------------------------------------------- 2022-05-18T05:14:42.0590941Z Ran 1 test in 3.237s 2022-05-18T05:14:42.0591487Z 2022-05-18T05:14:42.0591673Z OK 2022-05-18T05:14:42.0591795Z 2022-05-18T05:14:42.0591987Z Generating XML reports... 2022-05-18T05:14:42.0648939Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518051438.xml 2022-05-18T05:14:43.2456393Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T05:14:43.2470430Z 2022-05-18T05:14:43.2470852Z Running tests... 2022-05-18T05:14:43.2471327Z ---------------------------------------------------------------------- 2022-05-18T05:14:44.8564922Z test_scatter_basics (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:14:44.8922175Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 84530 2022-05-18T05:14:44.9030435Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 84531 2022-05-18T05:14:44.9141712Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 84532 2022-05-18T05:14:44.9254577Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 84533 2022-05-18T05:14:45.8038788Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:14:45.8246481Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:14:45.8279172Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:14:45.8285743Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:14:46.1306692Z ok (2.883s) 2022-05-18T05:14:46.1307041Z 2022-05-18T05:14:46.1307474Z ---------------------------------------------------------------------- 2022-05-18T05:14:46.1307824Z Ran 1 test in 2.884s 2022-05-18T05:14:46.1307995Z 2022-05-18T05:14:46.1308094Z OK 2022-05-18T05:14:46.1308243Z 2022-05-18T05:14:46.1308360Z Generating XML reports... 2022-05-18T05:14:46.1366887Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518051443.xml 2022-05-18T05:14:47.3082749Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T05:14:47.3098228Z 2022-05-18T05:14:47.3098536Z Running tests... 2022-05-18T05:14:47.3098979Z ---------------------------------------------------------------------- 2022-05-18T05:14:48.9615468Z test_scatter_basics_cuda (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:14:48.9979454Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 84723 2022-05-18T05:14:49.0091408Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 84724 2022-05-18T05:14:49.0204024Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 84725 2022-05-18T05:14:49.0318361Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 84726 2022-05-18T05:14:49.9084557Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:14:49.9155470Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:14:49.9299339Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:14:49.9771967Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:14:51.8405419Z ok (4.530s) 2022-05-18T05:14:51.8405687Z 2022-05-18T05:14:51.8406093Z ---------------------------------------------------------------------- 2022-05-18T05:14:51.8406442Z Ran 1 test in 4.531s 2022-05-18T05:14:51.8406593Z 2022-05-18T05:14:51.8406691Z OK 2022-05-18T05:14:51.8406829Z 2022-05-18T05:14:51.8406968Z Generating XML reports... 2022-05-18T05:14:51.8463404Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518051447.xml 2022-05-18T05:14:53.0205824Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T05:14:53.0219930Z 2022-05-18T05:14:53.0220311Z Running tests... 2022-05-18T05:14:53.0220874Z ---------------------------------------------------------------------- 2022-05-18T05:14:54.6240683Z test_scatter_checks (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:14:54.6600387Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 84920 2022-05-18T05:14:54.6713883Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 84921 2022-05-18T05:14:54.6824445Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 84922 2022-05-18T05:14:54.6935558Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 84923 2022-05-18T05:14:55.5744994Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:14:55.5798212Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:14:55.5932145Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:14:55.5950556Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:14:55.8984612Z ok (2.876s) 2022-05-18T05:14:55.8984937Z 2022-05-18T05:14:55.8985524Z ---------------------------------------------------------------------- 2022-05-18T05:14:55.8986243Z Ran 1 test in 2.876s 2022-05-18T05:14:55.8986634Z 2022-05-18T05:14:55.8986802Z OK 2022-05-18T05:14:55.8986966Z 2022-05-18T05:14:55.8987108Z Generating XML reports... 2022-05-18T05:14:55.9042584Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518051453.xml 2022-05-18T05:14:57.0917967Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T05:14:57.0933385Z 2022-05-18T05:14:57.0933530Z Running tests... 2022-05-18T05:14:57.0934424Z ---------------------------------------------------------------------- 2022-05-18T05:14:58.6855661Z test_scatter_stress (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:14:58.7210106Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 85113 2022-05-18T05:14:58.7320873Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 85114 2022-05-18T05:14:58.7432559Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 85115 2022-05-18T05:14:58.7543375Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 85116 2022-05-18T05:14:59.6918235Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:14:59.7059270Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:14:59.7236584Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:14:59.7388069Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:15:00.7610135Z ok (3.667s) 2022-05-18T05:15:00.7610468Z 2022-05-18T05:15:00.7611102Z ---------------------------------------------------------------------- 2022-05-18T05:15:00.7611459Z Ran 1 test in 3.668s 2022-05-18T05:15:00.7611626Z 2022-05-18T05:15:00.7611751Z OK 2022-05-18T05:15:00.7612014Z 2022-05-18T05:15:00.7612238Z Generating XML reports... 2022-05-18T05:15:00.7667619Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518051457.xml 2022-05-18T05:15:01.9366684Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T05:15:01.9381825Z 2022-05-18T05:15:01.9382066Z Running tests... 2022-05-18T05:15:01.9382528Z ---------------------------------------------------------------------- 2022-05-18T05:15:01.9390118Z test_scatter_stress_cuda (__main__.ProcessGroupGlooTest) ... skip: Test is flaky, see https://github.com/pytorch/pytorch/issues/15963 (0.001s) 2022-05-18T05:15:01.9390481Z 2022-05-18T05:15:01.9391266Z ---------------------------------------------------------------------- 2022-05-18T05:15:01.9391614Z Ran 1 test in 0.001s 2022-05-18T05:15:01.9391781Z 2022-05-18T05:15:01.9391894Z OK (skipped=1) 2022-05-18T05:15:01.9392051Z 2022-05-18T05:15:01.9392179Z Generating XML reports... 2022-05-18T05:15:01.9426035Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518051501.xml 2022-05-18T05:15:02.9686878Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T05:15:02.9701967Z 2022-05-18T05:15:02.9702443Z Running tests... 2022-05-18T05:15:02.9702976Z ---------------------------------------------------------------------- 2022-05-18T05:15:04.6255552Z test_send_recv_all_to_all (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:15:04.6616631Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 85365 2022-05-18T05:15:04.6726905Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 85366 2022-05-18T05:15:04.6839301Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 85367 2022-05-18T05:15:04.6953257Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 85368 2022-05-18T05:15:05.5864566Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:15:05.6005450Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:15:05.6396560Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:15:05.6871282Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:15:05.9001357Z ok (2.930s) 2022-05-18T05:15:05.9001579Z 2022-05-18T05:15:05.9001989Z ---------------------------------------------------------------------- 2022-05-18T05:15:05.9002321Z Ran 1 test in 2.930s 2022-05-18T05:15:05.9002492Z 2022-05-18T05:15:05.9002608Z OK 2022-05-18T05:15:05.9002745Z 2022-05-18T05:15:05.9002886Z Generating XML reports... 2022-05-18T05:15:05.9061339Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518051502.xml 2022-05-18T05:15:07.0672928Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T05:15:07.0686744Z 2022-05-18T05:15:07.0687020Z Running tests... 2022-05-18T05:15:07.0687467Z ---------------------------------------------------------------------- 2022-05-18T05:15:07.0692383Z test_sparse_allreduce_basics (__main__.ProcessGroupGlooTest) ... skip: intermittent failures on Windows, in CI (0.000s) 2022-05-18T05:15:07.0692717Z 2022-05-18T05:15:07.0693003Z ---------------------------------------------------------------------- 2022-05-18T05:15:07.0693345Z Ran 1 test in 0.001s 2022-05-18T05:15:07.0693510Z 2022-05-18T05:15:07.0693621Z OK (skipped=1) 2022-05-18T05:15:07.0693780Z 2022-05-18T05:15:07.0693899Z Generating XML reports... 2022-05-18T05:15:07.0727743Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518051507.xml 2022-05-18T05:15:08.0913889Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T05:15:08.0928591Z 2022-05-18T05:15:08.0928996Z Running tests... 2022-05-18T05:15:08.0929496Z ---------------------------------------------------------------------- 2022-05-18T05:15:09.7619505Z test_sparse_allreduce_basics_cuda (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:15:09.7982449Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 85593 2022-05-18T05:15:09.8093251Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 85594 2022-05-18T05:15:09.8205915Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 85595 2022-05-18T05:15:09.8317717Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 85596 2022-05-18T05:15:10.7248139Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:15:10.7248682Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:15:10.7707582Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:15:10.8305609Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:15:12.9425197Z ok (4.849s) 2022-05-18T05:15:12.9425440Z 2022-05-18T05:15:12.9425848Z ---------------------------------------------------------------------- 2022-05-18T05:15:12.9426205Z Ran 1 test in 4.850s 2022-05-18T05:15:12.9426355Z 2022-05-18T05:15:12.9426453Z OK 2022-05-18T05:15:12.9426591Z 2022-05-18T05:15:12.9426727Z Generating XML reports... 2022-05-18T05:15:12.9482356Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518051508.xml 2022-05-18T05:15:14.1217897Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T05:15:14.1233632Z 2022-05-18T05:15:14.1233976Z Running tests... 2022-05-18T05:15:14.1234441Z ---------------------------------------------------------------------- 2022-05-18T05:15:15.7303934Z test_sparse_allreduce_checks (__main__.ProcessGroupGlooTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:15:15.7657106Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 85970 2022-05-18T05:15:15.7771835Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 85971 2022-05-18T05:15:15.7881431Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 85972 2022-05-18T05:15:15.7995477Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 85973 2022-05-18T05:15:16.7525672Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:15:16.7530120Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:15:16.7952509Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:15:16.8101648Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:15:17.1047772Z ok (2.981s) 2022-05-18T05:15:17.1047965Z 2022-05-18T05:15:17.1048515Z ---------------------------------------------------------------------- 2022-05-18T05:15:17.1049039Z Ran 1 test in 2.981s 2022-05-18T05:15:17.1049193Z 2022-05-18T05:15:17.1049297Z OK 2022-05-18T05:15:17.1049437Z 2022-05-18T05:15:17.1049571Z Generating XML reports... 2022-05-18T05:15:17.1108106Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518051514.xml 2022-05-18T05:15:18.2829287Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T05:15:18.2844894Z 2022-05-18T05:15:18.2845014Z Running tests... 2022-05-18T05:15:18.2846142Z ---------------------------------------------------------------------- 2022-05-18T05:15:18.2920077Z test_forward_backward (__main__.ReducerTest) ... ok (0.007s) 2022-05-18T05:15:18.2966827Z 2022-05-18T05:15:18.2967204Z ---------------------------------------------------------------------- 2022-05-18T05:15:18.2967572Z Ran 1 test in 0.012s 2022-05-18T05:15:18.2967738Z 2022-05-18T05:15:18.2967836Z OK 2022-05-18T05:15:18.2967985Z 2022-05-18T05:15:18.2968101Z Generating XML reports... 2022-05-18T05:15:18.3003323Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ReducerTest-20220518051518.xml 2022-05-18T05:15:19.2810226Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T05:15:19.2831909Z 2022-05-18T05:15:19.2832175Z Running tests... 2022-05-18T05:15:19.2832659Z ---------------------------------------------------------------------- 2022-05-18T05:15:19.2918253Z test_forward_backward_optimizer (__main__.ReducerTest) ... [W reducer.cpp:1258] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 2022-05-18T05:15:19.2936461Z ok (0.011s) 2022-05-18T05:15:19.2950814Z 2022-05-18T05:15:19.2951243Z ---------------------------------------------------------------------- 2022-05-18T05:15:19.2951575Z Ran 1 test in 0.012s 2022-05-18T05:15:19.2951747Z 2022-05-18T05:15:19.2951842Z OK 2022-05-18T05:15:19.2951978Z 2022-05-18T05:15:19.2952107Z Generating XML reports... 2022-05-18T05:15:19.2986635Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ReducerTest-20220518051519.xml 2022-05-18T05:15:20.3131148Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T05:15:20.3146801Z 2022-05-18T05:15:20.3147218Z Running tests... 2022-05-18T05:15:20.3147732Z ---------------------------------------------------------------------- 2022-05-18T05:15:20.3223250Z test_forward_backward_unused_parameters (__main__.ReducerTest) ... ok (0.007s) 2022-05-18T05:15:20.3269330Z 2022-05-18T05:15:20.3269763Z ---------------------------------------------------------------------- 2022-05-18T05:15:20.3270114Z Ran 1 test in 0.012s 2022-05-18T05:15:20.3270288Z 2022-05-18T05:15:20.3270383Z OK 2022-05-18T05:15:20.3270506Z 2022-05-18T05:15:20.3270636Z Generating XML reports... 2022-05-18T05:15:20.3303425Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ReducerTest-20220518051520.xml 2022-05-18T05:15:21.3449449Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T05:15:21.3464631Z 2022-05-18T05:15:21.3465044Z Running tests... 2022-05-18T05:15:21.3465553Z ---------------------------------------------------------------------- 2022-05-18T05:15:21.3505059Z test_multi_dtype_multi_bucket (__main__.ReducerTest) ... ok (0.004s) 2022-05-18T05:15:21.3583394Z 2022-05-18T05:15:21.3583810Z ---------------------------------------------------------------------- 2022-05-18T05:15:21.3584155Z Ran 1 test in 0.012s 2022-05-18T05:15:21.3584332Z 2022-05-18T05:15:21.3584427Z OK 2022-05-18T05:15:21.3584565Z 2022-05-18T05:15:21.3584697Z Generating XML reports... 2022-05-18T05:15:21.3618198Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ReducerTest-20220518051521.xml 2022-05-18T05:15:22.3827888Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T05:15:22.3843069Z 2022-05-18T05:15:22.3843366Z Running tests... 2022-05-18T05:15:22.3843825Z ---------------------------------------------------------------------- 2022-05-18T05:15:22.3912513Z test_multi_dtype_single_bucket (__main__.ReducerTest) ... ok (0.007s) 2022-05-18T05:15:22.3962925Z 2022-05-18T05:15:22.3963332Z ---------------------------------------------------------------------- 2022-05-18T05:15:22.3963665Z Ran 1 test in 0.012s 2022-05-18T05:15:22.3963839Z 2022-05-18T05:15:22.3963935Z OK 2022-05-18T05:15:22.3964072Z 2022-05-18T05:15:22.3964202Z Generating XML reports... 2022-05-18T05:15:22.3998447Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ReducerTest-20220518051522.xml 2022-05-18T05:15:23.4096187Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T05:15:23.4109664Z 2022-05-18T05:15:23.4109897Z Running tests... 2022-05-18T05:15:23.4110358Z ---------------------------------------------------------------------- 2022-05-18T05:15:23.4144799Z test_single_dtype_single_bucket (__main__.ReducerTest) ... ok (0.003s) 2022-05-18T05:15:23.4226738Z 2022-05-18T05:15:23.4227449Z ---------------------------------------------------------------------- 2022-05-18T05:15:23.4227874Z Ran 1 test in 0.012s 2022-05-18T05:15:23.4228045Z 2022-05-18T05:15:23.4228147Z OK 2022-05-18T05:15:23.4228284Z 2022-05-18T05:15:23.4228396Z Generating XML reports... 2022-05-18T05:15:23.4261396Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ReducerTest-20220518051523.xml 2022-05-18T05:15:24.4443025Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T05:15:24.4460126Z 2022-05-18T05:15:24.4460606Z Running tests... 2022-05-18T05:15:24.4461138Z ---------------------------------------------------------------------- 2022-05-18T05:15:26.1147229Z test_logging_init (__main__.RendezvousEnvTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:15:26.1288717Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:15:26.1289563Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2022-05-18T05:15:26.1387508Z ok (1.693s) 2022-05-18T05:15:26.1388418Z 2022-05-18T05:15:26.1388746Z ---------------------------------------------------------------------- 2022-05-18T05:15:26.1389101Z Ran 1 test in 1.693s 2022-05-18T05:15:26.1389271Z 2022-05-18T05:15:26.1389351Z OK 2022-05-18T05:15:26.1389492Z 2022-05-18T05:15:26.1389635Z Generating XML reports... 2022-05-18T05:15:26.1423827Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-RendezvousEnvTest-20220518051524.xml 2022-05-18T05:15:27.3102636Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_gloo 2022-05-18T05:15:27.3117490Z 2022-05-18T05:15:27.3117724Z Running tests... 2022-05-18T05:15:27.3118177Z ---------------------------------------------------------------------- 2022-05-18T05:15:28.9671561Z test_default_store_timeout_gloo (__main__.TimeoutTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:15:28.9791642Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/74714 for allplatform(s) . If you're seeing this on your local machine and would like to enable this test, please make sure IN_CI is not set and you are not using the flag --import-disabled-tests. (1.667s) 2022-05-18T05:15:28.9792296Z 2022-05-18T05:15:28.9792701Z ---------------------------------------------------------------------- 2022-05-18T05:15:28.9793166Z Ran 1 test in 1.667s 2022-05-18T05:15:28.9793331Z 2022-05-18T05:15:28.9793453Z OK (skipped=1) 2022-05-18T05:15:28.9793612Z 2022-05-18T05:15:28.9793739Z Generating XML reports... 2022-05-18T05:15:28.9826214Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_gloo/TEST-TimeoutTest-20220518051527.xml 2022-05-18T05:15:29.3637553Z Running distributed/fsdp/test_fsdp_summon_full_params ... [2022-05-18 05:15:29.363207] 2022-05-18T05:15:29.3638356Z Executing ['/opt/conda/bin/python', 'distributed/fsdp/test_fsdp_summon_full_params.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2022-05-18 05:15:29.363313] 2022-05-18T05:15:30.2715287Z Test results will be stored in test-reports/python-unittest/distributed.fsdp.test_fsdp_summon_full_params 2022-05-18T05:15:30.2741131Z 2022-05-18T05:15:30.2741378Z Running tests... 2022-05-18T05:15:30.2741818Z ---------------------------------------------------------------------- 2022-05-18T05:15:31.8958418Z test_cannot_summon_full_params_from_backward (__main__.TestSummonFullParams) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:15:31.9320591Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 86519 2022-05-18T05:15:31.9435777Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 86520 2022-05-18T05:15:32.8514818Z dist init r=1, world=2 2022-05-18T05:15:32.8518348Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:15:32.8567980Z dist init r=0, world=2 2022-05-18T05:15:32.8573028Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:15:32.8574376Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:15:32.8621631Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:15:34.2259769Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:15:34.2260338Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:15:34.2474622Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:15:34.2475305Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:15:34.2507845Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:15:34.2508670Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:15:34.5411487Z Asserting FSDP instance is: FullyShardedDataParallel( 2022-05-18T05:15:34.5411912Z (_fsdp_wrapped_module): FlattenParamsWrapper( 2022-05-18T05:15:34.5412291Z (_fpw_module): Linear(in_features=2, out_features=1, bias=True) 2022-05-18T05:15:34.5412591Z ) 2022-05-18T05:15:34.5412783Z ) 2022-05-18T05:15:34.5413157Z ERROR: expected to be in states [] but current state is TrainingState_.BACKWARD_PRE 2022-05-18T05:15:34.5415506Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py", line 222, in bad_backwards_hook 2022-05-18T05:15:34.5415967Z with model.summon_full_params(model): 2022-05-18T05:15:34.5416341Z File "/opt/conda/lib/python3.7/contextlib.py", line 112, in __enter__ 2022-05-18T05:15:34.5416699Z return next(self.gen) 2022-05-18T05:15:34.5417375Z File "/opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 2495, in summon_full_params 2022-05-18T05:15:34.5417818Z offload_to_cpu=offload_to_cpu, 2022-05-18T05:15:34.5418178Z File "/opt/conda/lib/python3.7/contextlib.py", line 427, in enter_context 2022-05-18T05:15:34.5418526Z result = _cm_type.__enter__(cm) 2022-05-18T05:15:34.5418861Z File "/opt/conda/lib/python3.7/contextlib.py", line 112, in __enter__ 2022-05-18T05:15:34.5419186Z return next(self.gen) 2022-05-18T05:15:34.5419750Z File "/opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 2344, in _summon_full_params 2022-05-18T05:15:34.5420165Z offload_to_cpu=offload_to_cpu, 2022-05-18T05:15:34.5420530Z File "/opt/conda/lib/python3.7/contextlib.py", line 427, in enter_context 2022-05-18T05:15:34.5420874Z result = _cm_type.__enter__(cm) 2022-05-18T05:15:34.5421203Z File "/opt/conda/lib/python3.7/contextlib.py", line 112, in __enter__ 2022-05-18T05:15:34.5421527Z return next(self.gen) 2022-05-18T05:15:34.5422089Z File "/opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 2354, in _summon_full_params 2022-05-18T05:15:34.5422543Z self._assert_state([TrainingState_.IDLE]) 2022-05-18T05:15:34.5423103Z File "/opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 3298, in _assert_state 2022-05-18T05:15:34.5423520Z traceback.print_stack() 2022-05-18T05:15:34.8515714Z ok (4.577s) 2022-05-18T05:15:34.8648502Z test_cannot_summon_full_params_from_forward (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 86606 2022-05-18T05:15:34.8757794Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 86607 2022-05-18T05:15:35.7927527Z dist init r=0, world=2 2022-05-18T05:15:35.7931045Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:15:35.7990065Z dist init r=1, world=2 2022-05-18T05:15:35.7995354Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:15:35.7996313Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:15:35.8034706Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:15:37.1625860Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:15:37.1626397Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:15:37.1639372Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:15:37.1640055Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:15:37.1640899Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:15:37.1641539Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:15:37.1678517Z Asserting FSDP instance is: FullyShardedDataParallel( 2022-05-18T05:15:37.1678932Z (_fsdp_wrapped_module): FlattenParamsWrapper( 2022-05-18T05:15:37.1679237Z (_fpw_module): MyModule() 2022-05-18T05:15:37.1679491Z ) 2022-05-18T05:15:37.1679706Z ) 2022-05-18T05:15:37.1680260Z ERROR: expected to be in states [] but current state is TrainingState_.FORWARD 2022-05-18T05:15:37.1691122Z File "", line 1, in 2022-05-18T05:15:37.1691521Z File "/opt/conda/lib/python3.7/multiprocessing/spawn.py", line 105, in spawn_main 2022-05-18T05:15:37.1691914Z exitcode = _main(fd) 2022-05-18T05:15:37.1692476Z File "/opt/conda/lib/python3.7/multiprocessing/spawn.py", line 118, in _main 2022-05-18T05:15:37.1692854Z return self._bootstrap() 2022-05-18T05:15:37.1693235Z File "/opt/conda/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap 2022-05-18T05:15:37.1693565Z self.run() 2022-05-18T05:15:37.1694046Z File "/opt/conda/lib/python3.7/multiprocessing/process.py", line 99, in run 2022-05-18T05:15:37.1694500Z self._target(*self._args, **self._kwargs) 2022-05-18T05:15:37.1695029Z File "/opt/conda/lib/python3.7/site-packages/torch/testing/_internal/common_fsdp.py", line 429, in _run 2022-05-18T05:15:37.1695442Z self.run_test(test_name, pipe) 2022-05-18T05:15:37.1696200Z File "/opt/conda/lib/python3.7/site-packages/torch/testing/_internal/common_distributed.py", line 618, in run_test 2022-05-18T05:15:37.1696613Z getattr(self, test_name)() 2022-05-18T05:15:37.1697124Z File "/opt/conda/lib/python3.7/site-packages/torch/testing/_internal/common_distributed.py", line 499, in wrapper 2022-05-18T05:15:37.1697604Z fn() 2022-05-18T05:15:37.1698210Z File "/opt/conda/lib/python3.7/site-packages/torch/testing/_internal/common_distributed.py", line 141, in wrapper 2022-05-18T05:15:37.1698591Z return func(*args, **kwargs) 2022-05-18T05:15:37.1699056Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py", line 213, in test_cannot_summon_full_params_from_forward 2022-05-18T05:15:37.1699699Z model(model) 2022-05-18T05:15:37.1700187Z File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl 2022-05-18T05:15:37.1700807Z return forward_call(*input, **kwargs) 2022-05-18T05:15:37.1701640Z File "/opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 2246, in forward 2022-05-18T05:15:37.1702074Z outputs = self.module(*args, **kwargs) 2022-05-18T05:15:37.1702588Z File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl 2022-05-18T05:15:37.1703210Z return forward_call(*input, **kwargs) 2022-05-18T05:15:37.1703829Z File "/opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/flatten_params_wrapper.py", line 476, in forward 2022-05-18T05:15:37.1704268Z return self.module(*inputs, **kwinputs) 2022-05-18T05:15:37.1704850Z File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl 2022-05-18T05:15:37.1705352Z return forward_call(*input, **kwargs) 2022-05-18T05:15:37.1705774Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_summon_full_params.py", line 206, in forward 2022-05-18T05:15:37.1706215Z with fsdp_module.summon_full_params(fsdp_module): 2022-05-18T05:15:37.1706696Z File "/opt/conda/lib/python3.7/contextlib.py", line 112, in __enter__ 2022-05-18T05:15:37.1707111Z return next(self.gen) 2022-05-18T05:15:37.1707678Z File "/opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 2495, in summon_full_params 2022-05-18T05:15:37.1708096Z offload_to_cpu=offload_to_cpu, 2022-05-18T05:15:37.1708655Z File "/opt/conda/lib/python3.7/contextlib.py", line 427, in enter_context 2022-05-18T05:15:37.1709012Z result = _cm_type.__enter__(cm) 2022-05-18T05:15:37.1709349Z File "/opt/conda/lib/python3.7/contextlib.py", line 112, in __enter__ 2022-05-18T05:15:37.1709683Z return next(self.gen) 2022-05-18T05:15:37.1710456Z File "/opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 2344, in _summon_full_params 2022-05-18T05:15:37.1710901Z offload_to_cpu=offload_to_cpu, 2022-05-18T05:15:37.1711244Z File "/opt/conda/lib/python3.7/contextlib.py", line 427, in enter_context 2022-05-18T05:15:37.1711592Z result = _cm_type.__enter__(cm) 2022-05-18T05:15:37.1712087Z File "/opt/conda/lib/python3.7/contextlib.py", line 112, in __enter__ 2022-05-18T05:15:37.1712456Z return next(self.gen) 2022-05-18T05:15:37.1713025Z File "/opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 2354, in _summon_full_params 2022-05-18T05:15:37.1713481Z self._assert_state([TrainingState_.IDLE]) 2022-05-18T05:15:37.1714273Z File "/opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 3298, in _assert_state 2022-05-18T05:15:37.1714674Z traceback.print_stack() 2022-05-18T05:15:37.4829290Z ok (2.631s) 2022-05-18T05:15:37.4973124Z test_named_parameters_buffers_prefix__recurse_False (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 86689 2022-05-18T05:15:37.5083850Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 86690 2022-05-18T05:15:38.4179789Z dist init r=0, world=2 2022-05-18T05:15:38.4183116Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:15:38.4225846Z dist init r=1, world=2 2022-05-18T05:15:38.4230626Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:15:38.4231644Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:15:38.4286350Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:15:39.8273231Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:15:39.8273809Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:15:40.1153119Z ok (2.632s) 2022-05-18T05:15:40.1293184Z test_named_parameters_buffers_prefix__recurse_True (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 86772 2022-05-18T05:15:40.1402925Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 86773 2022-05-18T05:15:41.0745848Z dist init r=0, world=2 2022-05-18T05:15:41.0749396Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:15:41.0864414Z dist init r=1, world=2 2022-05-18T05:15:41.0869151Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:15:41.0870494Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:15:41.0954417Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:15:42.4816924Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:15:42.4817478Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:15:42.8475875Z ok (2.732s) 2022-05-18T05:15:42.8617880Z test_named_parameters_buffers_prefix_test_prefix_recurse_False (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 86855 2022-05-18T05:15:42.8731892Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 86856 2022-05-18T05:15:43.7319005Z dist init r=1, world=2 2022-05-18T05:15:43.7322284Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:15:43.7362768Z dist init r=0, world=2 2022-05-18T05:15:43.7367544Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:15:43.7368734Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:15:43.7425492Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:15:45.1341701Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:15:45.1342233Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:15:45.4804485Z ok (2.633s) 2022-05-18T05:15:45.4947414Z test_named_parameters_buffers_prefix_test_prefix_recurse_True (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 86938 2022-05-18T05:15:45.5058563Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 86939 2022-05-18T05:15:46.4391504Z dist init r=1, world=2 2022-05-18T05:15:46.4395013Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:15:46.4476689Z dist init r=0, world=2 2022-05-18T05:15:46.4481181Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:15:46.4482266Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:15:46.4498131Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:15:47.8319970Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:15:47.8320494Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:15:48.1129192Z ok (2.632s) 2022-05-18T05:15:48.1272172Z test_params_are_unflattenned_rank0_only_False_offload_to_cpu_False_mixed_precision_False (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 87021 2022-05-18T05:15:48.1380970Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 87022 2022-05-18T05:15:49.0797186Z dist init r=0, world=2 2022-05-18T05:15:49.0800401Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:15:49.0970950Z dist init r=1, world=2 2022-05-18T05:15:49.0975630Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:15:49.0976679Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:15:49.1005015Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:15:50.4714656Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:15:50.4715196Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:15:50.7451482Z ok (2.632s) 2022-05-18T05:15:50.7601439Z test_params_are_unflattenned_rank0_only_False_offload_to_cpu_False_mixed_precision_True (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 87104 2022-05-18T05:15:50.7710093Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 87105 2022-05-18T05:15:51.7213644Z dist init r=0, world=2 2022-05-18T05:15:51.7217649Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:15:51.7657059Z dist init r=1, world=2 2022-05-18T05:15:51.7661626Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:15:51.7662542Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:15:51.7727448Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:15:53.1686044Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:15:53.1686970Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:15:53.4791451Z ok (2.734s) 2022-05-18T05:15:53.4940952Z test_params_are_unflattenned_rank0_only_False_offload_to_cpu_True_mixed_precision_False (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 87187 2022-05-18T05:15:53.5048489Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 87188 2022-05-18T05:15:54.4234962Z dist init r=0, world=2 2022-05-18T05:15:54.4238865Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:15:54.4543213Z dist init r=1, world=2 2022-05-18T05:15:54.4547740Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:15:54.4548876Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:15:54.4647108Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:15:55.8336741Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:15:55.8337503Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:15:55.8576988Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:2311: UserWarning: offload_to_cpu and rank0_only=False will result in full parameters being redundantly copied to CPU memory for GPUs that reside on the same machine, which may incur the risk of CPU OOM. It is recommended to use ``offload_to_cpu`` with rank0_only=True. 2022-05-18T05:15:55.8577762Z "offload_to_cpu and rank0_only=False will result in " 2022-05-18T05:15:55.8610192Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:2311: UserWarning: offload_to_cpu and rank0_only=False will result in full parameters being redundantly copied to CPU memory for GPUs that reside on the same machine, which may incur the risk of CPU OOM. It is recommended to use ``offload_to_cpu`` with rank0_only=True. 2022-05-18T05:15:55.8611626Z "offload_to_cpu and rank0_only=False will result in " 2022-05-18T05:15:56.1121078Z ok (2.633s) 2022-05-18T05:15:56.1266491Z test_params_are_unflattenned_rank0_only_False_offload_to_cpu_True_mixed_precision_True (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 87270 2022-05-18T05:15:56.1375551Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 87271 2022-05-18T05:15:57.0909466Z dist init r=1, world=2 2022-05-18T05:15:57.0913156Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:15:57.1203696Z dist init r=0, world=2 2022-05-18T05:15:57.1208510Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:15:57.1210158Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:15:57.1220076Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:15:58.5102282Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:15:58.5103021Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:15:58.5336097Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:2311: UserWarning: offload_to_cpu and rank0_only=False will result in full parameters being redundantly copied to CPU memory for GPUs that reside on the same machine, which may incur the risk of CPU OOM. It is recommended to use ``offload_to_cpu`` with rank0_only=True. 2022-05-18T05:15:58.5337273Z "offload_to_cpu and rank0_only=False will result in " 2022-05-18T05:15:58.5339327Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:2311: UserWarning: offload_to_cpu and rank0_only=False will result in full parameters being redundantly copied to CPU memory for GPUs that reside on the same machine, which may incur the risk of CPU OOM. It is recommended to use ``offload_to_cpu`` with rank0_only=True. 2022-05-18T05:15:58.5340073Z "offload_to_cpu and rank0_only=False will result in " 2022-05-18T05:15:58.8449357Z ok (2.733s) 2022-05-18T05:15:58.8596833Z test_params_are_unflattenned_rank0_only_True_offload_to_cpu_False_mixed_precision_False (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 87353 2022-05-18T05:15:58.8705933Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 87354 2022-05-18T05:15:59.8158479Z dist init r=0, world=2 2022-05-18T05:15:59.8161750Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:15:59.8489096Z dist init r=1, world=2 2022-05-18T05:15:59.8493734Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:15:59.8494625Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:15:59.8569990Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:16:01.2256182Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:16:01.2256866Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:16:01.5779363Z ok (2.733s) 2022-05-18T05:16:01.5925307Z test_params_are_unflattenned_rank0_only_True_offload_to_cpu_False_mixed_precision_True (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 87436 2022-05-18T05:16:01.6038311Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 87437 2022-05-18T05:16:02.5429765Z dist init r=0, world=2 2022-05-18T05:16:02.5432671Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:16:02.5591158Z dist init r=1, world=2 2022-05-18T05:16:02.5596172Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:16:02.5597296Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:16:02.5638869Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:16:03.9276942Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:16:03.9277503Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:16:04.2110263Z ok (2.633s) 2022-05-18T05:16:04.2255167Z test_params_are_unflattenned_rank0_only_True_offload_to_cpu_True_mixed_precision_False (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 87519 2022-05-18T05:16:04.2363368Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 87520 2022-05-18T05:16:05.1477859Z dist init r=0, world=2 2022-05-18T05:16:05.1481489Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:16:05.1777863Z dist init r=1, world=2 2022-05-18T05:16:05.1782239Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:16:05.1783042Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:16:05.1787782Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:16:06.5822739Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:16:06.5823276Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:16:06.8435873Z ok (2.632s) 2022-05-18T05:16:06.8581547Z test_params_are_unflattenned_rank0_only_True_offload_to_cpu_True_mixed_precision_True (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 87602 2022-05-18T05:16:06.8689360Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 87603 2022-05-18T05:16:07.7747041Z dist init r=1, world=2 2022-05-18T05:16:07.7750175Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:16:07.7849531Z dist init r=0, world=2 2022-05-18T05:16:07.7854891Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:16:07.7856075Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:16:07.7955180Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:16:09.1849436Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:16:09.1849977Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:16:09.4762522Z ok (2.633s) 2022-05-18T05:16:09.4906980Z test_params_count_and_value_rank0_only_False_offload_to_cpu_False_mixed_precision_False (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 87685 2022-05-18T05:16:09.5016955Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 87686 2022-05-18T05:16:10.4045990Z dist init r=1, world=2 2022-05-18T05:16:10.4049133Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:16:10.4091160Z dist init r=0, world=2 2022-05-18T05:16:10.4096363Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:16:10.4097397Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:16:10.4152665Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:16:11.7897014Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:16:11.7897648Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:16:12.1092443Z ok (2.633s) 2022-05-18T05:16:12.1238162Z test_params_count_and_value_rank0_only_False_offload_to_cpu_False_mixed_precision_True (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 87768 2022-05-18T05:16:12.1351889Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 87769 2022-05-18T05:16:13.0709109Z dist init r=0, world=2 2022-05-18T05:16:13.0713034Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:16:13.0919927Z dist init r=1, world=2 2022-05-18T05:16:13.0924641Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:16:13.0926030Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:16:13.1019913Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:16:14.4719754Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:16:14.4720304Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:16:14.8425236Z ok (2.733s) 2022-05-18T05:16:14.8575570Z test_params_count_and_value_rank0_only_False_offload_to_cpu_True_mixed_precision_False (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 87851 2022-05-18T05:16:14.8688586Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 87852 2022-05-18T05:16:15.7666883Z dist init r=1, world=2 2022-05-18T05:16:15.7670514Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:16:15.7777670Z dist init r=0, world=2 2022-05-18T05:16:15.7782819Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:16:15.7783876Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:16:15.7875554Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:16:17.1608424Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:16:17.1608967Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:16:17.4759719Z ok (2.633s) 2022-05-18T05:16:17.4905041Z test_params_count_and_value_rank0_only_False_offload_to_cpu_True_mixed_precision_True (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 87934 2022-05-18T05:16:17.5013970Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 87935 2022-05-18T05:16:18.4143592Z dist init r=0, world=2 2022-05-18T05:16:18.4147456Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:16:18.4537616Z dist init r=1, world=2 2022-05-18T05:16:18.4542228Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:16:18.4543040Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:16:18.4555675Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:16:19.8475076Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:16:19.8475610Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:16:20.1085095Z ok (2.632s) 2022-05-18T05:16:20.1231085Z test_params_count_and_value_rank0_only_True_offload_to_cpu_False_mixed_precision_False (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 88017 2022-05-18T05:16:20.1341185Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 88018 2022-05-18T05:16:21.0337843Z dist init r=0, world=2 2022-05-18T05:16:21.0341031Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:16:21.0435823Z dist init r=1, world=2 2022-05-18T05:16:21.0440540Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:16:21.0441683Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:16:21.0444511Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:16:22.4114107Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:16:22.4115096Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:16:22.7410572Z ok (2.632s) 2022-05-18T05:16:22.7555994Z test_params_count_and_value_rank0_only_True_offload_to_cpu_False_mixed_precision_True (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 88100 2022-05-18T05:16:22.7666013Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 88101 2022-05-18T05:16:23.7152935Z dist init r=0, world=2 2022-05-18T05:16:23.7156466Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:16:23.7496796Z dist init r=1, world=2 2022-05-18T05:16:23.7501663Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:16:23.7502496Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:16:23.7564742Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:16:25.1396680Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:16:25.1397267Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:16:25.4738776Z ok (2.733s) 2022-05-18T05:16:25.4885292Z test_params_count_and_value_rank0_only_True_offload_to_cpu_True_mixed_precision_False (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 88183 2022-05-18T05:16:25.4995755Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 88184 2022-05-18T05:16:26.4029686Z dist init r=0, world=2 2022-05-18T05:16:26.4033070Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:16:26.4530354Z dist init r=1, world=2 2022-05-18T05:16:26.4534670Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:16:26.4535482Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:16:26.4542708Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:16:27.8625221Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:16:27.8626186Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:16:28.2070304Z ok (2.733s) 2022-05-18T05:16:28.2220431Z test_params_count_and_value_rank0_only_True_offload_to_cpu_True_mixed_precision_True (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 88266 2022-05-18T05:16:28.2335379Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 88267 2022-05-18T05:16:29.1436893Z dist init r=0, world=2 2022-05-18T05:16:29.1440215Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:16:29.1492811Z dist init r=1, world=2 2022-05-18T05:16:29.1497358Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:16:29.1498302Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:16:29.1543349Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:16:30.5278210Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:16:30.5278750Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:16:30.8406451Z ok (2.633s) 2022-05-18T05:16:30.8539728Z test_raises_rank0_with_writeback (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 88349 2022-05-18T05:16:30.8648905Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 88350 2022-05-18T05:16:31.7588071Z dist init r=1, world=2 2022-05-18T05:16:31.7591126Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:16:31.7726192Z dist init r=0, world=2 2022-05-18T05:16:31.7730972Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:16:31.7732123Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:16:31.7796138Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:16:33.1720339Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:16:33.1720873Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:16:33.4721296Z ok (2.631s) 2022-05-18T05:16:33.4873498Z test_reshard_outside_forward_backward_iteration_rank0_only_False_offload_to_cpu_False_mixed_precision_False (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 88432 2022-05-18T05:16:33.4983695Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 88433 2022-05-18T05:16:34.3928081Z dist init r=0, world=2 2022-05-18T05:16:34.3931680Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:16:34.4030402Z dist init r=1, world=2 2022-05-18T05:16:34.4035901Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:16:34.4036713Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:16:34.4137606Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:16:35.7618630Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:16:35.7619170Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:16:35.7831950Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:16:35.7832663Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:16:35.7833515Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:16:35.7834157Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:16:36.4062526Z ok (2.934s) 2022-05-18T05:16:36.4215199Z test_reshard_outside_forward_backward_iteration_rank0_only_False_offload_to_cpu_False_mixed_precision_True (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 88519 2022-05-18T05:16:36.4325848Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 88520 2022-05-18T05:16:37.3451054Z dist init r=1, world=2 2022-05-18T05:16:37.3454869Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:16:37.3539615Z dist init r=0, world=2 2022-05-18T05:16:37.3544134Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:16:37.3545184Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:16:37.3557813Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:16:38.7377832Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:16:38.7378373Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:16:38.7593190Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:16:38.7593867Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:16:38.7627772Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:16:38.7628423Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:16:39.3403573Z ok (2.934s) 2022-05-18T05:16:39.3555440Z test_reshard_outside_forward_backward_iteration_rank0_only_False_offload_to_cpu_True_mixed_precision_False (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 88606 2022-05-18T05:16:39.3665431Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 88607 2022-05-18T05:16:40.2895580Z dist init r=0, world=2 2022-05-18T05:16:40.2898929Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:16:40.2942578Z dist init r=1, world=2 2022-05-18T05:16:40.2947405Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:16:40.2948533Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:16:40.3002156Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:16:41.6590073Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:16:41.6591108Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:16:41.6793669Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:16:41.6795173Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:16:41.6796867Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:16:41.6798106Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:16:41.9747891Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:2311: UserWarning: offload_to_cpu and rank0_only=False will result in full parameters being redundantly copied to CPU memory for GPUs that reside on the same machine, which may incur the risk of CPU OOM. It is recommended to use ``offload_to_cpu`` with rank0_only=True. 2022-05-18T05:16:41.9749462Z "offload_to_cpu and rank0_only=False will result in " 2022-05-18T05:16:41.9752371Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:2311: UserWarning: offload_to_cpu and rank0_only=False will result in full parameters being redundantly copied to CPU memory for GPUs that reside on the same machine, which may incur the risk of CPU OOM. It is recommended to use ``offload_to_cpu`` with rank0_only=True. 2022-05-18T05:16:41.9753833Z "offload_to_cpu and rank0_only=False will result in " 2022-05-18T05:16:42.2743516Z ok (2.934s) 2022-05-18T05:16:42.2894930Z test_reshard_outside_forward_backward_iteration_rank0_only_False_offload_to_cpu_True_mixed_precision_True (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 88693 2022-05-18T05:16:42.3007007Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 88694 2022-05-18T05:16:43.1729901Z dist init r=0, world=2 2022-05-18T05:16:43.1733550Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:16:43.1887618Z dist init r=1, world=2 2022-05-18T05:16:43.1892452Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:16:43.1893401Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:16:43.1938897Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:16:44.5844106Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:16:44.5844646Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:16:44.6033653Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:16:44.6034354Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:16:44.6035381Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:16:44.6036021Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:16:44.9017261Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:2311: UserWarning: offload_to_cpu and rank0_only=False will result in full parameters being redundantly copied to CPU memory for GPUs that reside on the same machine, which may incur the risk of CPU OOM. It is recommended to use ``offload_to_cpu`` with rank0_only=True. 2022-05-18T05:16:44.9018456Z "offload_to_cpu and rank0_only=False will result in " 2022-05-18T05:16:44.9021336Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:2311: UserWarning: offload_to_cpu and rank0_only=False will result in full parameters being redundantly copied to CPU memory for GPUs that reside on the same machine, which may incur the risk of CPU OOM. It is recommended to use ``offload_to_cpu`` with rank0_only=True. 2022-05-18T05:16:44.9022080Z "offload_to_cpu and rank0_only=False will result in " 2022-05-18T05:16:45.2086183Z ok (2.934s) 2022-05-18T05:16:45.2240591Z test_reshard_outside_forward_backward_iteration_rank0_only_True_offload_to_cpu_False_mixed_precision_False (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 88780 2022-05-18T05:16:45.2349821Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 88781 2022-05-18T05:16:46.1488098Z dist init r=1, world=2 2022-05-18T05:16:46.1491911Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:16:46.1542153Z dist init r=0, world=2 2022-05-18T05:16:46.1546595Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:16:46.1547664Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:16:46.1595103Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:16:47.5129274Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:16:47.5130687Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:16:47.5352201Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:16:47.5353562Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:16:47.5355281Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:16:47.5356537Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:16:48.1426280Z ok (2.934s) 2022-05-18T05:16:48.1579390Z test_reshard_outside_forward_backward_iteration_rank0_only_True_offload_to_cpu_False_mixed_precision_True (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 88867 2022-05-18T05:16:48.1690429Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 88868 2022-05-18T05:16:49.0901736Z dist init r=0, world=2 2022-05-18T05:16:49.0905435Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:16:49.1218302Z dist init r=1, world=2 2022-05-18T05:16:49.1223676Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:16:49.1225109Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:16:49.1314514Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:16:50.5274379Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:16:50.5275390Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:16:50.5473479Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:16:50.5474832Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:16:50.5508573Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:16:50.5509900Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:16:51.1771240Z ok (3.034s) 2022-05-18T05:16:51.1929142Z test_reshard_outside_forward_backward_iteration_rank0_only_True_offload_to_cpu_True_mixed_precision_False (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 88954 2022-05-18T05:16:51.2041700Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 88955 2022-05-18T05:16:52.1193846Z dist init r=0, world=2 2022-05-18T05:16:52.1197327Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:16:52.1251822Z dist init r=1, world=2 2022-05-18T05:16:52.1257061Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:16:52.1259146Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:16:52.1300947Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:16:53.5030082Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:16:53.5030626Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:16:53.5234046Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:16:53.5234716Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:16:53.5235563Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:16:53.5236197Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:16:54.1120843Z ok (2.935s) 2022-05-18T05:16:54.1276325Z test_reshard_outside_forward_backward_iteration_rank0_only_True_offload_to_cpu_True_mixed_precision_True (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 89041 2022-05-18T05:16:54.1385797Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 89042 2022-05-18T05:16:55.0631774Z dist init r=0, world=2 2022-05-18T05:16:55.0635446Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:16:55.0968100Z dist init r=1, world=2 2022-05-18T05:16:55.0973571Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:16:55.0975001Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:16:55.1044057Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:16:56.4808048Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:16:56.4809056Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:16:56.5032858Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:16:56.5034259Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:16:56.5035963Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:16:56.5037213Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:16:57.1464946Z ok (3.034s) 2022-05-18T05:16:57.1612619Z test_summon_from_non_fsdp (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 89128 2022-05-18T05:16:57.1724804Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 89129 2022-05-18T05:16:58.0709415Z dist init r=0, world=2 2022-05-18T05:16:58.0712800Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:16:58.0887974Z dist init r=1, world=2 2022-05-18T05:16:58.0892725Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:16:58.0893889Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:16:58.0917359Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:16:59.4467729Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:16:59.4468282Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:16:59.7795136Z ok (2.633s) 2022-05-18T05:16:59.7937083Z test_summon_full_param_recursive_recurse_False_summon_outer_False_mixed_precision_False (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 89211 2022-05-18T05:16:59.8044310Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 89212 2022-05-18T05:17:00.7463966Z dist init r=0, world=2 2022-05-18T05:17:00.7467100Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:17:00.7945998Z dist init r=1, world=2 2022-05-18T05:17:00.7950665Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:17:00.7951675Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:17:00.7976760Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:17:02.1710321Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:17:02.1710890Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:17:02.1912357Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:17:02.1913068Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:17:02.1947858Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:17:02.1949052Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:17:02.5116709Z ok (2.732s) 2022-05-18T05:17:02.5262239Z test_summon_full_param_recursive_recurse_False_summon_outer_False_mixed_precision_True (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 89294 2022-05-18T05:17:02.5372669Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 89295 2022-05-18T05:17:03.4934819Z dist init r=1, world=2 2022-05-18T05:17:03.4938199Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:17:03.5018152Z dist init r=0, world=2 2022-05-18T05:17:03.5022678Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:17:03.5023492Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:17:03.5041370Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:17:04.8635512Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:17:04.8636063Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:17:04.8833043Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:17:04.8833721Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:17:04.8867745Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:17:04.8868411Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:17:05.1442957Z ok (2.632s) 2022-05-18T05:17:05.1586186Z test_summon_full_param_recursive_recurse_False_summon_outer_True_mixed_precision_False (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 89377 2022-05-18T05:17:05.1695180Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 89378 2022-05-18T05:17:06.0858661Z dist init r=1, world=2 2022-05-18T05:17:06.0862381Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:17:06.1129640Z dist init r=0, world=2 2022-05-18T05:17:06.1134653Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:17:06.1135613Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:17:06.1168678Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:17:07.4867245Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:17:07.4867820Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:17:07.5072854Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:17:07.5073525Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:17:07.5108861Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:17:07.5109799Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:17:07.7767144Z ok (2.632s) 2022-05-18T05:17:07.7910380Z test_summon_full_param_recursive_recurse_False_summon_outer_True_mixed_precision_True (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 89460 2022-05-18T05:17:07.8018398Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 89461 2022-05-18T05:17:08.7046183Z dist init r=1, world=2 2022-05-18T05:17:08.7049472Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:17:08.7112784Z dist init r=0, world=2 2022-05-18T05:17:08.7117653Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:17:08.7119044Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:17:08.7152577Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:17:10.0863179Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:17:10.0863737Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:17:10.1072232Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:17:10.1072948Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:17:10.1073802Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:17:10.1074450Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:17:10.4090197Z ok (2.632s) 2022-05-18T05:17:10.4237031Z test_summon_full_param_recursive_recurse_True_summon_outer_False_mixed_precision_False (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 89543 2022-05-18T05:17:10.4348645Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 89544 2022-05-18T05:17:11.3047214Z dist init r=1, world=2 2022-05-18T05:17:11.3051293Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:17:11.3312751Z dist init r=0, world=2 2022-05-18T05:17:11.3317279Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:17:11.3318093Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:17:11.3358522Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:17:12.7279371Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:17:12.7280405Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:17:12.7301980Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:17:12.7303354Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:17:12.7472061Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:17:12.7474093Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:17:13.0418401Z ok (2.633s) 2022-05-18T05:17:13.0566291Z test_summon_full_param_recursive_recurse_True_summon_outer_False_mixed_precision_True (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 89626 2022-05-18T05:17:13.0677824Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 89627 2022-05-18T05:17:14.0096618Z dist init r=0, world=2 2022-05-18T05:17:14.0099889Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:17:14.0127806Z dist init r=1, world=2 2022-05-18T05:17:14.0132488Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:17:14.0133675Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:17:14.0202967Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:17:15.3741577Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:17:15.3742133Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:17:15.3951830Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:17:15.3952494Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:17:15.3987350Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:17:15.3988024Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:17:15.6748577Z ok (2.633s) 2022-05-18T05:17:15.6892362Z test_summon_full_param_recursive_recurse_True_summon_outer_True_mixed_precision_False (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 89709 2022-05-18T05:17:15.7000984Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 89710 2022-05-18T05:17:16.6403347Z dist init r=1, world=2 2022-05-18T05:17:16.6407463Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:17:16.6586168Z dist init r=0, world=2 2022-05-18T05:17:16.6590695Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:17:16.6591955Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:17:16.6611684Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:17:18.0657446Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:17:18.0658021Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:17:18.0913969Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:17:18.0914647Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:17:18.0915475Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:17:18.0916351Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:17:18.4081125Z ok (2.733s) 2022-05-18T05:17:18.4224274Z test_summon_full_param_recursive_recurse_True_summon_outer_True_mixed_precision_True (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 89792 2022-05-18T05:17:18.4331538Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 89793 2022-05-18T05:17:19.3809125Z dist init r=0, world=2 2022-05-18T05:17:19.3813763Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:17:19.4037793Z dist init r=1, world=2 2022-05-18T05:17:19.4042237Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:17:19.4043049Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:17:19.4119828Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:17:20.7853253Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:17:20.7853815Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:17:20.8072814Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:17:20.8073491Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:17:20.8107795Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:17:20.8108462Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:17:21.1402551Z ok (2.732s) 2022-05-18T05:17:21.1544879Z test_summon_full_param_shard_value_mixed_precision_False (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 89875 2022-05-18T05:17:21.1654060Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 89876 2022-05-18T05:17:22.1088253Z dist init r=0, world=2 2022-05-18T05:17:22.1091828Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:17:22.1291437Z dist init r=1, world=2 2022-05-18T05:17:22.1296749Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:17:22.1297290Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:17:22.1297968Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:17:23.5204960Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:17:23.5205544Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:17:23.7725474Z ok (2.632s) 2022-05-18T05:17:23.7863774Z test_summon_full_param_shard_value_mixed_precision_True (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 89958 2022-05-18T05:17:23.7971697Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 89959 2022-05-18T05:17:24.7342307Z dist init r=0, world=2 2022-05-18T05:17:24.7345857Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:17:24.7473096Z dist init r=1, world=2 2022-05-18T05:17:24.7477803Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:17:24.7478616Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:17:24.7550813Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:17:26.1046782Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:17:26.1047331Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:17:26.4042479Z ok (2.632s) 2022-05-18T05:17:26.4175987Z test_summon_full_param_writeback_writeback_False_cpu_offload_CPUOffload(offload_params=False)_mixed_precision_False_modify_outer_False (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 90041 2022-05-18T05:17:26.4284157Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 90042 2022-05-18T05:17:27.3415769Z dist init r=1, world=2 2022-05-18T05:17:27.3419353Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:17:27.3439553Z dist init r=0, world=2 2022-05-18T05:17:27.3444457Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:17:27.3445401Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:17:27.3522591Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:17:28.7288426Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:17:28.7289407Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:17:29.0353058Z ok (2.631s) 2022-05-18T05:17:29.0487735Z test_summon_full_param_writeback_writeback_False_cpu_offload_CPUOffload(offload_params=False)_mixed_precision_False_modify_outer_True (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 90124 2022-05-18T05:17:29.0595963Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 90125 2022-05-18T05:17:29.9897608Z dist init r=0, world=2 2022-05-18T05:17:29.9900773Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:17:30.0325456Z dist init r=1, world=2 2022-05-18T05:17:30.0330084Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:17:30.0331234Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:17:30.0411009Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:17:31.4128409Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:17:31.4128932Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:17:31.7667712Z ok (2.731s) 2022-05-18T05:17:31.7800431Z test_summon_full_param_writeback_writeback_False_cpu_offload_CPUOffload(offload_params=False)_mixed_precision_True_modify_outer_False (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 90207 2022-05-18T05:17:31.7909183Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 90208 2022-05-18T05:17:32.7289477Z dist init r=1, world=2 2022-05-18T05:17:32.7293094Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:17:32.7498545Z dist init r=0, world=2 2022-05-18T05:17:32.7504226Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:17:32.7505932Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:17:32.7600810Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:17:34.1259137Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:17:34.1260149Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:17:34.3979742Z ok (2.631s) 2022-05-18T05:17:34.4111376Z test_summon_full_param_writeback_writeback_False_cpu_offload_CPUOffload(offload_params=False)_mixed_precision_True_modify_outer_True (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 90290 2022-05-18T05:17:34.4220044Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 90291 2022-05-18T05:17:35.3160754Z dist init r=0, world=2 2022-05-18T05:17:35.3164038Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:17:35.3580829Z dist init r=1, world=2 2022-05-18T05:17:35.3585314Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:17:35.3586448Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:17:35.3673987Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:17:36.7193251Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:17:36.7193793Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:17:37.0290537Z ok (2.631s) 2022-05-18T05:17:37.0426073Z test_summon_full_param_writeback_writeback_False_cpu_offload_CPUOffload(offload_params=True)_mixed_precision_False_modify_outer_False (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 90373 2022-05-18T05:17:37.0534319Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 90374 2022-05-18T05:17:37.9735057Z dist init r=0, world=2 2022-05-18T05:17:37.9738817Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:17:38.0059878Z dist init r=1, world=2 2022-05-18T05:17:38.0064469Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:17:38.0065268Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:17:38.0147006Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:17:39.3873655Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:17:39.3874301Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:17:39.6605743Z ok (2.631s) 2022-05-18T05:17:39.6737603Z test_summon_full_param_writeback_writeback_False_cpu_offload_CPUOffload(offload_params=True)_mixed_precision_False_modify_outer_True (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 90456 2022-05-18T05:17:39.6845225Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 90457 2022-05-18T05:17:40.6518394Z dist init r=0, world=2 2022-05-18T05:17:40.6521790Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:17:40.6666139Z dist init r=1, world=2 2022-05-18T05:17:40.6670905Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:17:40.6672239Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:17:40.6726866Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:17:42.0430321Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:17:42.0431156Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:17:42.3918466Z ok (2.731s) 2022-05-18T05:17:42.4051104Z test_summon_full_param_writeback_writeback_False_cpu_offload_CPUOffload(offload_params=True)_mixed_precision_True_modify_outer_False (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 90539 2022-05-18T05:17:42.4158481Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 90540 2022-05-18T05:17:43.3143526Z dist init r=0, world=2 2022-05-18T05:17:43.3146470Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:17:43.3493421Z dist init r=1, world=2 2022-05-18T05:17:43.3498161Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:17:43.3499147Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:17:43.3554499Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:17:44.7204318Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:17:44.7204992Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:17:45.0228670Z ok (2.631s) 2022-05-18T05:17:45.0361462Z test_summon_full_param_writeback_writeback_False_cpu_offload_CPUOffload(offload_params=True)_mixed_precision_True_modify_outer_True (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 90622 2022-05-18T05:17:45.0471347Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 90623 2022-05-18T05:17:45.9808543Z dist init r=1, world=2 2022-05-18T05:17:45.9811444Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:17:45.9860024Z dist init r=0, world=2 2022-05-18T05:17:45.9864776Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:17:45.9866115Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:17:45.9914692Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:17:47.3769896Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:17:47.3770779Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:17:47.6541541Z ok (2.631s) 2022-05-18T05:17:47.6676914Z test_summon_full_param_writeback_writeback_True_cpu_offload_CPUOffload(offload_params=False)_mixed_precision_False_modify_outer_False (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 90705 2022-05-18T05:17:47.6785337Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 90706 2022-05-18T05:17:48.5929454Z dist init r=1, world=2 2022-05-18T05:17:48.5932589Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:17:48.5937764Z dist init r=0, world=2 2022-05-18T05:17:48.5942446Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:17:48.5943703Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:17:48.6036478Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:17:49.9773874Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:17:49.9774865Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:17:50.2854366Z ok (2.631s) 2022-05-18T05:17:50.2987215Z test_summon_full_param_writeback_writeback_True_cpu_offload_CPUOffload(offload_params=False)_mixed_precision_False_modify_outer_True (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 90788 2022-05-18T05:17:50.3096673Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 90789 2022-05-18T05:17:51.2253020Z dist init r=1, world=2 2022-05-18T05:17:51.2256685Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:17:51.2703572Z dist init r=0, world=2 2022-05-18T05:17:51.2708990Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:17:51.2710403Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:17:51.2767452Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:17:52.6404346Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:17:52.6405363Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:17:52.9166553Z ok (2.631s) 2022-05-18T05:17:52.9299721Z test_summon_full_param_writeback_writeback_True_cpu_offload_CPUOffload(offload_params=False)_mixed_precision_True_modify_outer_False (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 90871 2022-05-18T05:17:52.9408327Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 90872 2022-05-18T05:17:53.8413941Z dist init r=0, world=2 2022-05-18T05:17:53.8417056Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:17:53.8567584Z dist init r=1, world=2 2022-05-18T05:17:53.8572243Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:17:53.8573461Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:17:53.8622017Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:17:55.2529828Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:17:55.2530550Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:17:55.5477572Z ok (2.631s) 2022-05-18T05:17:55.5610818Z test_summon_full_param_writeback_writeback_True_cpu_offload_CPUOffload(offload_params=False)_mixed_precision_True_modify_outer_True (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 90954 2022-05-18T05:17:55.5719710Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 90955 2022-05-18T05:17:56.4240892Z dist init r=1, world=2 2022-05-18T05:17:56.4243764Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:17:56.4843979Z dist init r=0, world=2 2022-05-18T05:17:56.4848680Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:17:56.4849903Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:17:56.4854423Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:17:57.8794099Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:17:57.8794648Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:17:58.1791094Z ok (2.631s) 2022-05-18T05:17:58.1927232Z test_summon_full_param_writeback_writeback_True_cpu_offload_CPUOffload(offload_params=True)_mixed_precision_False_modify_outer_False (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 91037 2022-05-18T05:17:58.2036224Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 91038 2022-05-18T05:17:59.1101571Z dist init r=1, world=2 2022-05-18T05:17:59.1105123Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:17:59.1348724Z dist init r=0, world=2 2022-05-18T05:17:59.1353460Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:17:59.1354264Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:17:59.1411777Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:18:00.5283157Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:18:00.5283716Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:18:00.8106183Z ok (2.631s) 2022-05-18T05:18:00.8239344Z test_summon_full_param_writeback_writeback_True_cpu_offload_CPUOffload(offload_params=True)_mixed_precision_False_modify_outer_True (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 91120 2022-05-18T05:18:00.8348648Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 91121 2022-05-18T05:18:01.7486567Z dist init r=1, world=2 2022-05-18T05:18:01.7489744Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:18:01.7633214Z dist init r=0, world=2 2022-05-18T05:18:01.7639053Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:18:01.7640491Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:18:01.7695833Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:18:03.1501253Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:18:03.1502266Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:18:03.4420197Z ok (2.631s) 2022-05-18T05:18:03.4551280Z test_summon_full_param_writeback_writeback_True_cpu_offload_CPUOffload(offload_params=True)_mixed_precision_True_modify_outer_False (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 91203 2022-05-18T05:18:03.4659136Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 91204 2022-05-18T05:18:04.4155420Z dist init r=0, world=2 2022-05-18T05:18:04.4158686Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:18:04.4217816Z dist init r=1, world=2 2022-05-18T05:18:04.4222258Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:18:04.4223429Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:18:04.4262818Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:18:05.8131236Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:18:05.8131785Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:18:06.1731484Z ok (2.731s) 2022-05-18T05:18:06.1866456Z test_summon_full_param_writeback_writeback_True_cpu_offload_CPUOffload(offload_params=True)_mixed_precision_True_modify_outer_True (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 91286 2022-05-18T05:18:06.1974419Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 91287 2022-05-18T05:18:07.1306452Z dist init r=1, world=2 2022-05-18T05:18:07.1309612Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:18:07.1506565Z dist init r=0, world=2 2022-05-18T05:18:07.1511510Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:18:07.1512651Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:18:07.1514018Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:18:08.5504727Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:18:08.5505250Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:18:08.8047890Z ok (2.631s) 2022-05-18T05:18:08.8188430Z test_summon_full_params_equivalence_rank0_only_False_offload_to_cpu_False (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 91369 2022-05-18T05:18:08.8296369Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 91370 2022-05-18T05:18:09.7798205Z dist init r=0, world=2 2022-05-18T05:18:09.7801288Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:18:09.7968919Z dist init r=1, world=2 2022-05-18T05:18:09.7973621Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:18:09.7974451Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:18:09.8006111Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:18:11.1845039Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:18:11.1845601Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:18:11.5368308Z ok (2.732s) 2022-05-18T05:18:11.5511322Z test_summon_full_params_equivalence_rank0_only_False_offload_to_cpu_True (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 91452 2022-05-18T05:18:11.5622379Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 91453 2022-05-18T05:18:12.4732126Z dist init r=0, world=2 2022-05-18T05:18:12.4735595Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:18:12.4887493Z dist init r=1, world=2 2022-05-18T05:18:12.4892116Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:18:12.4893348Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:18:12.4940617Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:18:13.8698540Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:18:13.8699398Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:18:13.8994279Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:2311: UserWarning: offload_to_cpu and rank0_only=False will result in full parameters being redundantly copied to CPU memory for GPUs that reside on the same machine, which may incur the risk of CPU OOM. It is recommended to use ``offload_to_cpu`` with rank0_only=True. 2022-05-18T05:18:13.8995072Z "offload_to_cpu and rank0_only=False will result in " 2022-05-18T05:18:13.8996101Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:2311: UserWarning: offload_to_cpu and rank0_only=False will result in full parameters being redundantly copied to CPU memory for GPUs that reside on the same machine, which may incur the risk of CPU OOM. It is recommended to use ``offload_to_cpu`` with rank0_only=True. 2022-05-18T05:18:13.8996824Z "offload_to_cpu and rank0_only=False will result in " 2022-05-18T05:18:14.1694794Z ok (2.632s) 2022-05-18T05:18:14.1835585Z test_summon_full_params_equivalence_rank0_only_True_offload_to_cpu_False (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 91535 2022-05-18T05:18:14.1944517Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 91536 2022-05-18T05:18:15.1273636Z dist init r=0, world=2 2022-05-18T05:18:15.1277081Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:18:15.1521617Z dist init r=1, world=2 2022-05-18T05:18:15.1526573Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:18:15.1527709Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:18:15.1583509Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:18:16.5256244Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:18:16.5256775Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:18:16.8015874Z ok (2.632s) 2022-05-18T05:18:16.8155995Z test_summon_full_params_equivalence_rank0_only_True_offload_to_cpu_True (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 91618 2022-05-18T05:18:16.8265189Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 91619 2022-05-18T05:18:17.7723646Z dist init r=1, world=2 2022-05-18T05:18:17.7726836Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:18:17.7835851Z dist init r=0, world=2 2022-05-18T05:18:17.7840586Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:18:17.7841735Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:18:17.7932065Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:18:19.1592097Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:18:19.1592665Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:18:19.4344283Z ok (2.633s) 2022-05-18T05:18:19.4482508Z test_summon_full_params_respects_reshard_after_forward_mixed_precision_False (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 91701 2022-05-18T05:18:19.4589991Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 91702 2022-05-18T05:18:20.4038679Z dist init r=0, world=2 2022-05-18T05:18:20.4041809Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:18:20.4204944Z dist init r=1, world=2 2022-05-18T05:18:20.4209570Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:18:20.4210623Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:18:20.4246762Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:18:21.7925002Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:18:21.7925540Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:18:21.8113880Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:18:21.8114578Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:18:21.8115418Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:18:21.8116072Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:18:22.3668039Z ok (2.932s) 2022-05-18T05:18:22.3808614Z test_summon_full_params_respects_reshard_after_forward_mixed_precision_True (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 91784 2022-05-18T05:18:22.3919067Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 91785 2022-05-18T05:18:23.3010605Z dist init r=1, world=2 2022-05-18T05:18:23.3014005Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:18:23.3049817Z dist init r=0, world=2 2022-05-18T05:18:23.3055446Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:18:23.3056928Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:18:23.3117951Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:18:24.6905832Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:18:24.6906853Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:18:24.7155210Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:18:24.7156606Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:18:24.7158319Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:18:24.7159604Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:18:25.3007626Z ok (2.934s) 2022-05-18T05:18:25.3144720Z test_summon_single_param (__main__.TestSummonFullParams) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 91867 2022-05-18T05:18:25.3253303Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 91868 2022-05-18T05:18:26.2395290Z dist init r=0, world=2 2022-05-18T05:18:26.2398389Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:18:26.2532128Z dist init r=1, world=2 2022-05-18T05:18:26.2537698Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:18:26.2539125Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:18:26.2604248Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:18:27.6272355Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:18:27.6273409Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:18:27.6473807Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:18:27.6475186Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:18:27.6509127Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:18:27.6510488Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:18:27.9323608Z ok (2.631s) 2022-05-18T05:18:27.9454884Z test_summon_full_param_writeback_writeback_False_modify_outer_False_mixed_precision_False (__main__.TestSummonFullParamsNoShard) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 91950 2022-05-18T05:18:28.8613532Z dist init r=0, world=1 2022-05-18T05:18:28.8617090Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:18:28.8618121Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2022-05-18T05:18:30.1762875Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:18:30.4518914Z ok (2.519s) 2022-05-18T05:18:30.4650860Z test_summon_full_param_writeback_writeback_False_modify_outer_False_mixed_precision_True (__main__.TestSummonFullParamsNoShard) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 91992 2022-05-18T05:18:31.3794902Z dist init r=0, world=1 2022-05-18T05:18:31.3798236Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:18:31.3799027Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2022-05-18T05:18:32.7070573Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:18:32.9715758Z ok (2.520s) 2022-05-18T05:18:32.9847140Z test_summon_full_param_writeback_writeback_False_modify_outer_True_mixed_precision_False (__main__.TestSummonFullParamsNoShard) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 92034 2022-05-18T05:18:33.8983391Z dist init r=0, world=1 2022-05-18T05:18:33.8986654Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:18:33.8987730Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2022-05-18T05:18:35.2043611Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:18:35.4910051Z ok (2.519s) 2022-05-18T05:18:35.5041387Z test_summon_full_param_writeback_writeback_False_modify_outer_True_mixed_precision_True (__main__.TestSummonFullParamsNoShard) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 92076 2022-05-18T05:18:36.4061502Z dist init r=0, world=1 2022-05-18T05:18:36.4065023Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:18:36.4066109Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2022-05-18T05:18:37.6868010Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:18:37.9102016Z ok (2.419s) 2022-05-18T05:18:37.9233122Z test_summon_full_param_writeback_writeback_True_modify_outer_False_mixed_precision_False (__main__.TestSummonFullParamsNoShard) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 92118 2022-05-18T05:18:38.8471644Z dist init r=0, world=1 2022-05-18T05:18:38.8474897Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:18:38.8475715Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2022-05-18T05:18:40.1659052Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:18:40.4297740Z ok (2.519s) 2022-05-18T05:18:40.4432041Z test_summon_full_param_writeback_writeback_True_modify_outer_False_mixed_precision_True (__main__.TestSummonFullParamsNoShard) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 92160 2022-05-18T05:18:41.3385659Z dist init r=0, world=1 2022-05-18T05:18:41.3389061Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:18:41.3389872Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2022-05-18T05:18:42.6581833Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:18:42.9495873Z ok (2.520s) 2022-05-18T05:18:42.9627728Z test_summon_full_param_writeback_writeback_True_modify_outer_True_mixed_precision_False (__main__.TestSummonFullParamsNoShard) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 92202 2022-05-18T05:18:43.8748688Z dist init r=0, world=1 2022-05-18T05:18:43.8752314Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:18:43.8753321Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2022-05-18T05:18:45.1780159Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:18:45.4693341Z ok (2.520s) 2022-05-18T05:18:45.4830116Z test_summon_full_param_writeback_writeback_True_modify_outer_True_mixed_precision_True (__main__.TestSummonFullParamsNoShard) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 92244 2022-05-18T05:18:46.3993495Z dist init r=0, world=1 2022-05-18T05:18:46.3996860Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:18:46.3997679Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2022-05-18T05:18:47.7127810Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:18:47.9895330Z ok (2.520s) 2022-05-18T05:18:47.9895538Z 2022-05-18T05:18:47.9895932Z ---------------------------------------------------------------------- 2022-05-18T05:18:47.9897378Z Ran 73 tests in 197.715s 2022-05-18T05:18:47.9897569Z 2022-05-18T05:18:47.9897670Z OK 2022-05-18T05:18:47.9897807Z 2022-05-18T05:18:47.9899820Z Generating XML reports... 2022-05-18T05:18:48.0012548Z Generated XML report: test-reports/python-unittest/distributed.fsdp.test_fsdp_summon_full_params/TEST-TestSummonFullParams-20220518051530.xml 2022-05-18T05:18:48.0024575Z Generated XML report: test-reports/python-unittest/distributed.fsdp.test_fsdp_summon_full_params/TEST-TestSummonFullParamsNoShard-20220518051530.xml 2022-05-18T05:18:48.2807271Z Running distributed/fsdp/test_fsdp_state_dict ... [2022-05-18 05:18:48.280235] 2022-05-18T05:18:48.2808355Z Executing ['/opt/conda/bin/python', 'distributed/fsdp/test_fsdp_state_dict.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2022-05-18 05:18:48.280334] 2022-05-18T05:18:49.2049319Z Test results will be stored in test-reports/python-unittest/distributed.fsdp.test_fsdp_state_dict 2022-05-18T05:18:49.2073375Z 2022-05-18T05:18:49.2073690Z Running tests... 2022-05-18T05:18:49.2074345Z ---------------------------------------------------------------------- 2022-05-18T05:18:49.2129064Z test_basic_save_and_load_state_dict_state_dict_type_local_state_dict_cpu_offload_CPUOffload(offload_params=False)_fp16_False_state_dict_rank0_and_offload_False (__main__.TestFSDPStateDict) 2022-05-18T05:18:50.8685237Z Tests that we can save a state_dict and load it into a blank model ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:18:50.9058210Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 92323 2022-05-18T05:18:50.9171938Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 92324 2022-05-18T05:18:51.8119758Z dist init r=1, world=2 2022-05-18T05:18:51.8123076Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:18:51.8295844Z dist init r=0, world=2 2022-05-18T05:18:51.8300390Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:18:51.8301330Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:18:51.8328700Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:18:53.2351439Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:18:53.2351975Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:18:53.6256416Z ok (4.418s) 2022-05-18T05:18:53.6280261Z test_basic_save_and_load_state_dict_state_dict_type_local_state_dict_cpu_offload_CPUOffload(offload_params=False)_fp16_False_state_dict_rank0_and_offload_True (__main__.TestFSDPStateDict) 2022-05-18T05:18:53.6406183Z Tests that we can save a state_dict and load it into a blank model ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 92406 2022-05-18T05:18:53.6516916Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 92407 2022-05-18T05:18:54.5946831Z dist init r=1, world=2 2022-05-18T05:18:54.5950311Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:18:54.6147443Z dist init r=0, world=2 2022-05-18T05:18:54.6152039Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:18:54.6153200Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:18:54.6154626Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:18:55.9838554Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:18:55.9839103Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:18:56.2589788Z ok (2.633s) 2022-05-18T05:18:56.2613285Z test_basic_save_and_load_state_dict_state_dict_type_local_state_dict_cpu_offload_CPUOffload(offload_params=False)_fp16_True_state_dict_rank0_and_offload_False (__main__.TestFSDPStateDict) 2022-05-18T05:18:56.2739203Z Tests that we can save a state_dict and load it into a blank model ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 92489 2022-05-18T05:18:56.2849255Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 92490 2022-05-18T05:18:57.2137753Z dist init r=1, world=2 2022-05-18T05:18:57.2141510Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:18:57.2352750Z dist init r=0, world=2 2022-05-18T05:18:57.2356943Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:18:57.2357789Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:18:57.2448367Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:18:58.6099711Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:18:58.6100246Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:18:58.8922195Z ok (2.633s) 2022-05-18T05:18:58.8944782Z test_basic_save_and_load_state_dict_state_dict_type_local_state_dict_cpu_offload_CPUOffload(offload_params=False)_fp16_True_state_dict_rank0_and_offload_True (__main__.TestFSDPStateDict) 2022-05-18T05:18:58.9066950Z Tests that we can save a state_dict and load it into a blank model ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 92572 2022-05-18T05:18:58.9175350Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 92573 2022-05-18T05:18:59.8232100Z dist init r=1, world=2 2022-05-18T05:18:59.8235364Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:18:59.8279191Z dist init r=0, world=2 2022-05-18T05:18:59.8283957Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:18:59.8284941Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:18:59.8338599Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:19:01.2243034Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:19:01.2243696Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:19:01.5246656Z ok (2.632s) 2022-05-18T05:19:01.5270809Z test_basic_save_and_load_state_dict_state_dict_type_local_state_dict_cpu_offload_CPUOffload(offload_params=True)_fp16_False_state_dict_rank0_and_offload_False (__main__.TestFSDPStateDict) 2022-05-18T05:19:01.5395960Z Tests that we can save a state_dict and load it into a blank model ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 92655 2022-05-18T05:19:01.5505754Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 92656 2022-05-18T05:19:02.4809677Z dist init r=0, world=2 2022-05-18T05:19:02.4813159Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:19:02.4894348Z dist init r=1, world=2 2022-05-18T05:19:02.4899335Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:19:02.4900228Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:19:02.4915921Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:19:03.8656386Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:19:03.8656919Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:19:04.2581111Z ok (2.733s) 2022-05-18T05:19:04.2605002Z test_basic_save_and_load_state_dict_state_dict_type_local_state_dict_cpu_offload_CPUOffload(offload_params=True)_fp16_False_state_dict_rank0_and_offload_True (__main__.TestFSDPStateDict) 2022-05-18T05:19:04.2736658Z Tests that we can save a state_dict and load it into a blank model ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 92738 2022-05-18T05:19:04.2850476Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 92739 2022-05-18T05:19:05.2341201Z dist init r=0, world=2 2022-05-18T05:19:05.2344363Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:19:05.2444699Z dist init r=1, world=2 2022-05-18T05:19:05.2449556Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:19:05.2450760Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:19:05.2550326Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:19:06.6374724Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:19:06.6375262Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:19:06.8921993Z ok (2.634s) 2022-05-18T05:19:06.8945131Z test_basic_save_and_load_state_dict_state_dict_type_local_state_dict_cpu_offload_CPUOffload(offload_params=True)_fp16_True_state_dict_rank0_and_offload_False (__main__.TestFSDPStateDict) 2022-05-18T05:19:06.9068676Z Tests that we can save a state_dict and load it into a blank model ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 92821 2022-05-18T05:19:06.9181661Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 92822 2022-05-18T05:19:07.8498036Z dist init r=1, world=2 2022-05-18T05:19:07.8501348Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:19:07.8542642Z dist init r=0, world=2 2022-05-18T05:19:07.8547538Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:19:07.8548397Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:19:07.8604753Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:19:09.2296957Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:19:09.2297519Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:19:09.5254546Z ok (2.633s) 2022-05-18T05:19:09.5277445Z test_basic_save_and_load_state_dict_state_dict_type_local_state_dict_cpu_offload_CPUOffload(offload_params=True)_fp16_True_state_dict_rank0_and_offload_True (__main__.TestFSDPStateDict) 2022-05-18T05:19:09.5399943Z Tests that we can save a state_dict and load it into a blank model ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 92904 2022-05-18T05:19:09.5508672Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 92905 2022-05-18T05:19:10.4733367Z dist init r=0, world=2 2022-05-18T05:19:10.4737278Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:19:10.4931803Z dist init r=1, world=2 2022-05-18T05:19:10.4936634Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:19:10.4937487Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:19:10.4942141Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:19:11.8804222Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:19:11.8804820Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:19:12.1578556Z ok (2.632s) 2022-05-18T05:19:12.1601854Z test_basic_save_and_load_state_dict_state_dict_type_sharded_state_dict_cpu_offload_CPUOffload(offload_params=False)_fp16_False_state_dict_rank0_and_offload_False (__main__.TestFSDPStateDict) 2022-05-18T05:19:12.1726208Z Tests that we can save a state_dict and load it into a blank model ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 92987 2022-05-18T05:19:12.1835743Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 92988 2022-05-18T05:19:13.1076670Z dist init r=1, world=2 2022-05-18T05:19:13.1080535Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:19:13.1429469Z dist init r=0, world=2 2022-05-18T05:19:13.1434119Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:19:13.1434915Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:19:13.1488755Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:19:14.5358764Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:19:14.5359272Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:19:14.8909131Z ok (2.733s) 2022-05-18T05:19:14.8932457Z test_basic_save_and_load_state_dict_state_dict_type_sharded_state_dict_cpu_offload_CPUOffload(offload_params=False)_fp16_False_state_dict_rank0_and_offload_True (__main__.TestFSDPStateDict) 2022-05-18T05:19:14.9060070Z Tests that we can save a state_dict and load it into a blank model ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 93070 2022-05-18T05:19:14.9172394Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 93071 2022-05-18T05:19:15.8251654Z dist init r=1, world=2 2022-05-18T05:19:15.8255250Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:19:15.8298201Z dist init r=0, world=2 2022-05-18T05:19:15.8303044Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:19:15.8304044Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:19:15.8358521Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:19:17.2187231Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:19:17.2187906Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:19:17.5243173Z ok (2.633s) 2022-05-18T05:19:17.5266952Z test_basic_save_and_load_state_dict_state_dict_type_sharded_state_dict_cpu_offload_CPUOffload(offload_params=False)_fp16_True_state_dict_rank0_and_offload_False (__main__.TestFSDPStateDict) 2022-05-18T05:19:17.5396060Z Tests that we can save a state_dict and load it into a blank model ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 93153 2022-05-18T05:19:17.5508056Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 93154 2022-05-18T05:19:18.4640883Z dist init r=1, world=2 2022-05-18T05:19:18.4644887Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:19:18.5115793Z dist init r=0, world=2 2022-05-18T05:19:18.5120522Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:19:18.5121339Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:19:18.5154721Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:19:19.8989871Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:19:19.8990790Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:19:20.2579959Z ok (2.733s) 2022-05-18T05:19:20.2603521Z test_basic_save_and_load_state_dict_state_dict_type_sharded_state_dict_cpu_offload_CPUOffload(offload_params=False)_fp16_True_state_dict_rank0_and_offload_True (__main__.TestFSDPStateDict) 2022-05-18T05:19:20.2727487Z Tests that we can save a state_dict and load it into a blank model ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 93236 2022-05-18T05:19:20.2837740Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 93237 2022-05-18T05:19:21.2239032Z dist init r=1, world=2 2022-05-18T05:19:21.2242716Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:19:21.2765171Z dist init r=0, world=2 2022-05-18T05:19:21.2770146Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:19:21.2771573Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:19:21.2854033Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:19:22.6451214Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:19:22.6451890Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:19:22.8907392Z ok (2.633s) 2022-05-18T05:19:22.8929544Z test_basic_save_and_load_state_dict_state_dict_type_sharded_state_dict_cpu_offload_CPUOffload(offload_params=True)_fp16_False_state_dict_rank0_and_offload_False (__main__.TestFSDPStateDict) 2022-05-18T05:19:22.9054278Z Tests that we can save a state_dict and load it into a blank model ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 93319 2022-05-18T05:19:22.9165198Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 93320 2022-05-18T05:19:23.8395906Z dist init r=1, world=2 2022-05-18T05:19:23.8399120Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:19:23.8416927Z dist init r=0, world=2 2022-05-18T05:19:23.8421254Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:19:23.8422539Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:19:23.8502689Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:19:25.2215365Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:19:25.2216108Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:19:25.6240450Z ok (2.733s) 2022-05-18T05:19:25.6265912Z test_basic_save_and_load_state_dict_state_dict_type_sharded_state_dict_cpu_offload_CPUOffload(offload_params=True)_fp16_False_state_dict_rank0_and_offload_True (__main__.TestFSDPStateDict) 2022-05-18T05:19:25.6397264Z Tests that we can save a state_dict and load it into a blank model ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 93402 2022-05-18T05:19:25.6511400Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 93403 2022-05-18T05:19:26.5255776Z dist init r=1, world=2 2022-05-18T05:19:26.5259182Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:19:26.5784220Z dist init r=0, world=2 2022-05-18T05:19:26.5789305Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:19:26.5790123Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:19:26.5871687Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:19:27.9736886Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:19:27.9737703Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:19:28.2589924Z ok (2.635s) 2022-05-18T05:19:28.2612532Z test_basic_save_and_load_state_dict_state_dict_type_sharded_state_dict_cpu_offload_CPUOffload(offload_params=True)_fp16_True_state_dict_rank0_and_offload_False (__main__.TestFSDPStateDict) 2022-05-18T05:19:28.2741127Z Tests that we can save a state_dict and load it into a blank model ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 93485 2022-05-18T05:19:28.2851350Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 93486 2022-05-18T05:19:29.1981957Z dist init r=0, world=2 2022-05-18T05:19:29.1984319Z dist init r=1, world=2 2022-05-18T05:19:29.1985037Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:19:29.1988850Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:19:29.1990309Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:19:29.2088514Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:19:30.5606411Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:19:30.5606941Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:19:30.8923268Z ok (2.633s) 2022-05-18T05:19:30.8946115Z test_basic_save_and_load_state_dict_state_dict_type_sharded_state_dict_cpu_offload_CPUOffload(offload_params=True)_fp16_True_state_dict_rank0_and_offload_True (__main__.TestFSDPStateDict) 2022-05-18T05:19:30.9070811Z Tests that we can save a state_dict and load it into a blank model ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 93568 2022-05-18T05:19:30.9180310Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 93569 2022-05-18T05:19:31.8562800Z dist init r=1, world=2 2022-05-18T05:19:31.8565765Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:19:31.8684562Z dist init r=0, world=2 2022-05-18T05:19:31.8690060Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:19:31.8691894Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:19:31.8771358Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:19:33.2454489Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:19:33.2455492Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:19:33.5251379Z ok (2.633s) 2022-05-18T05:19:33.5274147Z test_basic_save_and_load_state_dict_state_dict_type_state_dict_cpu_offload_CPUOffload(offload_params=False)_fp16_False_state_dict_rank0_and_offload_False (__main__.TestFSDPStateDict) 2022-05-18T05:19:33.5398061Z Tests that we can save a state_dict and load it into a blank model ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 93651 2022-05-18T05:19:33.5507388Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 93652 2022-05-18T05:19:34.4709014Z dist init r=0, world=2 2022-05-18T05:19:34.4712156Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:19:34.5138056Z dist init r=1, world=2 2022-05-18T05:19:34.5142639Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:19:34.5143792Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:19:34.5222977Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:19:35.9109470Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:19:35.9109993Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:19:36.2581094Z ok (2.733s) 2022-05-18T05:19:36.2604675Z test_basic_save_and_load_state_dict_state_dict_type_state_dict_cpu_offload_CPUOffload(offload_params=False)_fp16_False_state_dict_rank0_and_offload_True (__main__.TestFSDPStateDict) 2022-05-18T05:19:36.2730011Z Tests that we can save a state_dict and load it into a blank model ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 93734 2022-05-18T05:19:36.2843714Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 93735 2022-05-18T05:19:37.2023694Z dist init r=0, world=2 2022-05-18T05:19:37.2027354Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:19:37.2184070Z dist init r=1, world=2 2022-05-18T05:19:37.2188563Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:19:37.2189669Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:19:37.2232615Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:19:38.6105419Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:19:38.6106458Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:19:38.9917206Z ok (2.733s) 2022-05-18T05:19:38.9940790Z test_basic_save_and_load_state_dict_state_dict_type_state_dict_cpu_offload_CPUOffload(offload_params=False)_fp16_True_state_dict_rank0_and_offload_False (__main__.TestFSDPStateDict) 2022-05-18T05:19:39.0071851Z Tests that we can save a state_dict and load it into a blank model ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 93817 2022-05-18T05:19:39.0186625Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 93818 2022-05-18T05:19:39.9606960Z dist init r=0, world=2 2022-05-18T05:19:39.9610639Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:19:39.9753901Z dist init r=1, world=2 2022-05-18T05:19:39.9758347Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:19:39.9759561Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:19:39.9816003Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:19:41.4006990Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:19:41.4007536Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:19:41.7264549Z ok (2.735s) 2022-05-18T05:19:41.7287614Z test_basic_save_and_load_state_dict_state_dict_type_state_dict_cpu_offload_CPUOffload(offload_params=False)_fp16_True_state_dict_rank0_and_offload_True (__main__.TestFSDPStateDict) 2022-05-18T05:19:41.7413094Z Tests that we can save a state_dict and load it into a blank model ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 93900 2022-05-18T05:19:41.7524529Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 93901 2022-05-18T05:19:42.6535381Z dist init r=1, world=2 2022-05-18T05:19:42.6538588Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:19:42.6677381Z dist init r=0, world=2 2022-05-18T05:19:42.6682204Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:19:42.6683163Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:19:42.6743634Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:19:44.0429862Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:19:44.0430418Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:19:44.3596662Z ok (2.633s) 2022-05-18T05:19:44.3619610Z test_basic_save_and_load_state_dict_state_dict_type_state_dict_cpu_offload_CPUOffload(offload_params=True)_fp16_False_state_dict_rank0_and_offload_False (__main__.TestFSDPStateDict) 2022-05-18T05:19:44.3744193Z Tests that we can save a state_dict and load it into a blank model ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 93983 2022-05-18T05:19:44.3854684Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 93984 2022-05-18T05:19:45.2344370Z dist init r=1, world=2 2022-05-18T05:19:45.2347711Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:19:45.2574013Z dist init r=0, world=2 2022-05-18T05:19:45.2578623Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:19:45.2579503Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:19:45.2654444Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:19:46.6549252Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:19:46.6549779Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:19:46.9933415Z ok (2.634s) 2022-05-18T05:19:46.9955829Z test_basic_save_and_load_state_dict_state_dict_type_state_dict_cpu_offload_CPUOffload(offload_params=True)_fp16_False_state_dict_rank0_and_offload_True (__main__.TestFSDPStateDict) 2022-05-18T05:19:47.0079749Z Tests that we can save a state_dict and load it into a blank model ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 94066 2022-05-18T05:19:47.0187769Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 94067 2022-05-18T05:19:47.9450024Z dist init r=0, world=2 2022-05-18T05:19:47.9453549Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:19:47.9640849Z dist init r=1, world=2 2022-05-18T05:19:47.9645399Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:19:47.9646768Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:19:47.9658296Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:19:49.3650547Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:19:49.3651361Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:19:49.7260669Z ok (2.733s) 2022-05-18T05:19:49.7283930Z test_basic_save_and_load_state_dict_state_dict_type_state_dict_cpu_offload_CPUOffload(offload_params=True)_fp16_True_state_dict_rank0_and_offload_False (__main__.TestFSDPStateDict) 2022-05-18T05:19:49.7412105Z Tests that we can save a state_dict and load it into a blank model ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 94149 2022-05-18T05:19:49.7524848Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 94150 2022-05-18T05:19:50.6689522Z dist init r=0, world=2 2022-05-18T05:19:50.6693127Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:19:50.6700644Z dist init r=1, world=2 2022-05-18T05:19:50.6704936Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:19:50.6706106Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:19:50.6796440Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:19:52.0671740Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:19:52.0672299Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:19:52.4595920Z ok (2.733s) 2022-05-18T05:19:52.4619390Z test_basic_save_and_load_state_dict_state_dict_type_state_dict_cpu_offload_CPUOffload(offload_params=True)_fp16_True_state_dict_rank0_and_offload_True (__main__.TestFSDPStateDict) 2022-05-18T05:19:52.4742500Z Tests that we can save a state_dict and load it into a blank model ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 94232 2022-05-18T05:19:52.4851400Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 94233 2022-05-18T05:19:53.4002508Z dist init r=1, world=2 2022-05-18T05:19:53.4005858Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:19:53.4514487Z dist init r=0, world=2 2022-05-18T05:19:53.4519235Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:19:53.4520302Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:19:53.4617766Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:19:54.8519363Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:19:54.8519908Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:19:55.1925383Z ok (2.733s) 2022-05-18T05:19:55.2064457Z test_fsdp_state_dict_keys_state_dict_type_local_state_dict (__main__.TestFSDPStateDict) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 94315 2022-05-18T05:19:55.2175475Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 94316 2022-05-18T05:19:56.1259077Z dist init r=1, world=2 2022-05-18T05:19:56.1262327Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:19:56.1804282Z dist init r=0, world=2 2022-05-18T05:19:56.1809160Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:19:56.1810718Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:19:56.1873740Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:19:57.5759778Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:19:57.5760338Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:19:57.5955620Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:19:57.5956529Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:19:57.5989073Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:19:57.5989743Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:19:57.9248615Z ok (2.732s) 2022-05-18T05:19:57.9386114Z test_fsdp_state_dict_keys_state_dict_type_sharded_state_dict (__main__.TestFSDPStateDict) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 94398 2022-05-18T05:19:57.9496894Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 94399 2022-05-18T05:19:58.8590751Z dist init r=1, world=2 2022-05-18T05:19:58.8593924Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:19:58.9181649Z dist init r=0, world=2 2022-05-18T05:19:58.9186498Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:19:58.9187596Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:19:58.9205168Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:20:00.3127865Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:20:00.3128434Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:20:00.3352670Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:20:00.3353370Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:20:00.3354238Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:20:00.3354858Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:20:00.6568437Z ok (2.732s) 2022-05-18T05:20:00.6702972Z test_fsdp_state_dict_keys_state_dict_type_state_dict (__main__.TestFSDPStateDict) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 94481 2022-05-18T05:20:00.6814168Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 94482 2022-05-18T05:20:01.5804820Z dist init r=0, world=2 2022-05-18T05:20:01.5808062Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:20:01.5899762Z dist init r=1, world=2 2022-05-18T05:20:01.5904590Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:20:01.5905672Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:20:01.5910959Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:20:02.9858158Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:20:02.9858702Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:20:03.0113437Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:20:03.0114446Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:20:03.0115413Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:20:03.0116047Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:20:03.2885227Z ok (2.631s) 2022-05-18T05:20:03.3019499Z test_fsdp_state_dict_with_activation_checkpoint_checkpoint_wrap_both (__main__.TestFSDPStateDict) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 94564 2022-05-18T05:20:03.3128754Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 94565 2022-05-18T05:20:04.2359481Z dist init r=0, world=2 2022-05-18T05:20:04.2363012Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:20:04.2587070Z dist init r=1, world=2 2022-05-18T05:20:04.2591569Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:20:04.2592389Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:20:04.2669828Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:20:05.6475613Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:20:05.6476163Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:20:06.0201121Z ok (2.731s) 2022-05-18T05:20:06.0336259Z test_fsdp_state_dict_with_activation_checkpoint_checkpoint_wrap_first (__main__.TestFSDPStateDict) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 94647 2022-05-18T05:20:06.0446576Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 94648 2022-05-18T05:20:06.9879554Z dist init r=1, world=2 2022-05-18T05:20:06.9882760Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:20:07.0019821Z dist init r=0, world=2 2022-05-18T05:20:07.0024839Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:20:07.0025900Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:20:07.0087834Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:20:08.3937626Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:20:08.3938176Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:20:08.7520772Z ok (2.732s) 2022-05-18T05:20:08.7658119Z test_fsdp_state_dict_with_activation_checkpoint_checkpoint_wrap_second (__main__.TestFSDPStateDict) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 94730 2022-05-18T05:20:08.7767836Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 94731 2022-05-18T05:20:09.6972646Z dist init r=1, world=2 2022-05-18T05:20:09.6976155Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:20:09.6999276Z dist init r=0, world=2 2022-05-18T05:20:09.7004537Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:20:09.7005667Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:20:09.7079315Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:20:11.0899753Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:20:11.0900308Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:20:11.4842481Z ok (2.732s) 2022-05-18T05:20:11.4985816Z test_load_activation_checkpointed_module (__main__.TestFSDPStateDict) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 94813 2022-05-18T05:20:11.5098330Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 94814 2022-05-18T05:20:12.4487179Z dist init r=1, world=2 2022-05-18T05:20:12.4490323Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:20:12.4718301Z dist init r=0, world=2 2022-05-18T05:20:12.4723398Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:20:12.4724205Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:20:12.4796024Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:20:13.8415397Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:20:13.8415961Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:20:13.8698741Z 2022-05-18T05:20:13.8699399Z 2022-05-18T05:20:14.1168392Z ok (2.632s) 2022-05-18T05:20:14.1198585Z test_save_and_load_after_forward_state_dict_state_dict_type_local_state_dict_mixed_precision_False_state_dict_rank0_and_offload_False (__main__.TestFSDPStateDict) 2022-05-18T05:20:14.1320996Z Test that saving after some training results in params being updated as ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 94896 2022-05-18T05:20:14.1429383Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 94897 2022-05-18T05:20:15.0543028Z dist init r=1, world=2 2022-05-18T05:20:15.0546692Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:20:15.0817246Z dist init r=0, world=2 2022-05-18T05:20:15.0821560Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:20:15.0822417Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:20:15.0853044Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:20:16.4569221Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:20:16.4569769Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:20:17.1511359Z ok (3.034s) 2022-05-18T05:20:17.1541862Z test_save_and_load_after_forward_state_dict_state_dict_type_local_state_dict_mixed_precision_False_state_dict_rank0_and_offload_True (__main__.TestFSDPStateDict) 2022-05-18T05:20:17.1668811Z Test that saving after some training results in params being updated as ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 94983 2022-05-18T05:20:17.1781298Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 94984 2022-05-18T05:20:18.0949002Z dist init r=0, world=2 2022-05-18T05:20:18.0952238Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:20:18.0983299Z dist init r=1, world=2 2022-05-18T05:20:18.0988290Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:20:18.0989559Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:20:18.1054508Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:20:19.4814382Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:20:19.4815299Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:20:19.7853870Z ok (2.634s) 2022-05-18T05:20:19.7883821Z test_save_and_load_after_forward_state_dict_state_dict_type_local_state_dict_mixed_precision_True_state_dict_rank0_and_offload_False (__main__.TestFSDPStateDict) 2022-05-18T05:20:19.8009501Z Test that saving after some training results in params being updated as ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 95066 2022-05-18T05:20:19.8121209Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 95067 2022-05-18T05:20:20.7357678Z dist init r=0, world=2 2022-05-18T05:20:20.7358008Z dist init r=1, world=2 2022-05-18T05:20:20.7360428Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:20:20.7362746Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:20:20.7363566Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:20:20.7464096Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:20:22.1001398Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:20:22.1001930Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:20:22.7200470Z ok (2.934s) 2022-05-18T05:20:22.7230280Z test_save_and_load_after_forward_state_dict_state_dict_type_local_state_dict_mixed_precision_True_state_dict_rank0_and_offload_True (__main__.TestFSDPStateDict) 2022-05-18T05:20:22.7356641Z Test that saving after some training results in params being updated as ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 95153 2022-05-18T05:20:22.7470543Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 95154 2022-05-18T05:20:23.6699236Z dist init r=1, world=2 2022-05-18T05:20:23.6703157Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:20:23.7079672Z dist init r=0, world=2 2022-05-18T05:20:23.7084339Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:20:23.7085284Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:20:23.7111559Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:20:25.0878890Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:20:25.0879414Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:20:25.3542223Z ok (2.634s) 2022-05-18T05:20:25.3572367Z test_save_and_load_after_forward_state_dict_state_dict_type_sharded_state_dict_mixed_precision_False_state_dict_rank0_and_offload_False (__main__.TestFSDPStateDict) 2022-05-18T05:20:25.3697070Z Test that saving after some training results in params being updated as ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 95236 2022-05-18T05:20:25.3807519Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 95237 2022-05-18T05:20:26.2909510Z dist init r=0, world=2 2022-05-18T05:20:26.2912686Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:20:26.3450275Z dist init r=1, world=2 2022-05-18T05:20:26.3455570Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:20:26.3456456Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:20:26.3524561Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:20:27.7352415Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:20:27.7353300Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:20:28.3888681Z ok (3.034s) 2022-05-18T05:20:28.3919597Z test_save_and_load_after_forward_state_dict_state_dict_type_sharded_state_dict_mixed_precision_False_state_dict_rank0_and_offload_True (__main__.TestFSDPStateDict) 2022-05-18T05:20:28.4047115Z Test that saving after some training results in params being updated as ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 95323 2022-05-18T05:20:28.4157787Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 95324 2022-05-18T05:20:29.3375336Z dist init r=1, world=2 2022-05-18T05:20:29.3379349Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:20:29.3672794Z dist init r=0, world=2 2022-05-18T05:20:29.3677526Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:20:29.3678393Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:20:29.3685800Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:20:30.7473412Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:20:30.7473959Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:20:31.0230792Z ok (2.634s) 2022-05-18T05:20:31.0260545Z test_save_and_load_after_forward_state_dict_state_dict_type_sharded_state_dict_mixed_precision_True_state_dict_rank0_and_offload_False (__main__.TestFSDPStateDict) 2022-05-18T05:20:31.0384420Z Test that saving after some training results in params being updated as ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 95406 2022-05-18T05:20:31.0494404Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 95407 2022-05-18T05:20:31.9815754Z dist init r=1, world=2 2022-05-18T05:20:31.9819969Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:20:32.0017947Z dist init r=0, world=2 2022-05-18T05:20:32.0022474Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:20:32.0023446Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:20:32.0024409Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:20:33.3912153Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:20:33.3912688Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:20:34.0576454Z ok (3.034s) 2022-05-18T05:20:34.0605765Z test_save_and_load_after_forward_state_dict_state_dict_type_sharded_state_dict_mixed_precision_True_state_dict_rank0_and_offload_True (__main__.TestFSDPStateDict) 2022-05-18T05:20:34.0734126Z Test that saving after some training results in params being updated as ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 95493 2022-05-18T05:20:34.0845956Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 95494 2022-05-18T05:20:34.9981073Z dist init r=1, world=2 2022-05-18T05:20:34.9984414Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:20:35.0055457Z dist init r=0, world=2 2022-05-18T05:20:35.0060764Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:20:35.0062078Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:20:35.0087140Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:20:36.4211730Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:20:36.4212285Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:20:36.6917157Z ok (2.634s) 2022-05-18T05:20:36.6946937Z test_save_and_load_after_forward_state_dict_state_dict_type_state_dict_mixed_precision_False_state_dict_rank0_and_offload_False (__main__.TestFSDPStateDict) 2022-05-18T05:20:36.7070412Z Test that saving after some training results in params being updated as ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 95576 2022-05-18T05:20:36.7179290Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 95577 2022-05-18T05:20:37.6464526Z dist init r=1, world=2 2022-05-18T05:20:37.6468663Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:20:37.6818139Z dist init r=0, world=2 2022-05-18T05:20:37.6822573Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:20:37.6823489Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:20:37.6876289Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:20:39.0612283Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:20:39.0612817Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:20:39.7259446Z ok (3.034s) 2022-05-18T05:20:39.7288127Z test_save_and_load_after_forward_state_dict_state_dict_type_state_dict_mixed_precision_False_state_dict_rank0_and_offload_True (__main__.TestFSDPStateDict) 2022-05-18T05:20:39.7413078Z Test that saving after some training results in params being updated as ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 95663 2022-05-18T05:20:39.7524232Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 95664 2022-05-18T05:20:40.6717736Z dist init r=0, world=2 2022-05-18T05:20:40.6721138Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:20:40.7083002Z dist init r=1, world=2 2022-05-18T05:20:40.7087305Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:20:40.7088458Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:20:40.7129589Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:20:42.1170372Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:20:42.1170867Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:20:42.7604053Z ok (3.034s) 2022-05-18T05:20:42.7633261Z test_save_and_load_after_forward_state_dict_state_dict_type_state_dict_mixed_precision_True_state_dict_rank0_and_offload_False (__main__.TestFSDPStateDict) 2022-05-18T05:20:42.7760766Z Test that saving after some training results in params being updated as ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 95750 2022-05-18T05:20:42.7872475Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 95751 2022-05-18T05:20:43.7116497Z dist init r=0, world=2 2022-05-18T05:20:43.7119860Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:20:43.7126445Z dist init r=1, world=2 2022-05-18T05:20:43.7131473Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:20:43.7132491Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:20:43.7222981Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:20:45.1040206Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:20:45.1040752Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:20:45.7953682Z ok (3.035s) 2022-05-18T05:20:45.7983598Z test_save_and_load_after_forward_state_dict_state_dict_type_state_dict_mixed_precision_True_state_dict_rank0_and_offload_True (__main__.TestFSDPStateDict) 2022-05-18T05:20:45.8106513Z Test that saving after some training results in params being updated as ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 95837 2022-05-18T05:20:45.8215372Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 95838 2022-05-18T05:20:46.7337367Z dist init r=1, world=2 2022-05-18T05:20:46.7340531Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:20:46.7356054Z dist init r=0, world=2 2022-05-18T05:20:46.7360728Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:20:46.7361751Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:20:46.7444304Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:20:48.1128528Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:20:48.1129092Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:20:48.7293205Z ok (2.934s) 2022-05-18T05:20:48.7316563Z test_state_dict_load_into_local_module_state_dict_type_sharded_state_dict_state_dict_rank0_and_offload_False (__main__.TestFSDPStateDict) 2022-05-18T05:20:48.7439851Z Tests that FSDP's state_dict can be loaded into a local model. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 95924 2022-05-18T05:20:48.7549289Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 95925 2022-05-18T05:20:49.6746427Z dist init r=0, world=2 2022-05-18T05:20:49.6749815Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:20:49.7066226Z dist init r=1, world=2 2022-05-18T05:20:49.7071020Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:20:49.7072181Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:20:49.7158633Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:20:51.0759045Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:20:51.0759587Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:20:51.0996852Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:20:51.0997743Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:20:51.0998700Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:20:51.0999339Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:20:51.7627589Z ok (3.033s) 2022-05-18T05:20:51.7650650Z test_state_dict_load_into_local_module_state_dict_type_sharded_state_dict_state_dict_rank0_and_offload_True (__main__.TestFSDPStateDict) 2022-05-18T05:20:51.7774604Z Tests that FSDP's state_dict can be loaded into a local model. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 96011 2022-05-18T05:20:51.7883086Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 96012 2022-05-18T05:20:52.7213015Z dist init r=1, world=2 2022-05-18T05:20:52.7215692Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:20:52.7423058Z dist init r=0, world=2 2022-05-18T05:20:52.7427752Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:20:52.7429117Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:20:52.7522592Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:20:54.1311705Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:20:54.1312368Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:20:54.3954607Z ok (2.633s) 2022-05-18T05:20:54.3977683Z test_state_dict_load_into_local_module_state_dict_type_state_dict_state_dict_rank0_and_offload_False (__main__.TestFSDPStateDict) 2022-05-18T05:20:54.4102716Z Tests that FSDP's state_dict can be loaded into a local model. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 96094 2022-05-18T05:20:54.4211838Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 96095 2022-05-18T05:20:55.3776393Z dist init r=1, world=2 2022-05-18T05:20:55.3779970Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:20:55.3899221Z dist init r=0, world=2 2022-05-18T05:20:55.3904074Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:20:55.3904960Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:20:55.3984796Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:20:56.7929035Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:20:56.7929573Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:20:56.8157003Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:20:56.8157687Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:20:56.8158518Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:20:56.8159452Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:20:57.4291113Z ok (3.033s) 2022-05-18T05:20:57.4314963Z test_state_dict_load_into_local_module_state_dict_type_state_dict_state_dict_rank0_and_offload_True (__main__.TestFSDPStateDict) 2022-05-18T05:20:57.4442060Z Tests that FSDP's state_dict can be loaded into a local model. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 96181 2022-05-18T05:20:57.4553052Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 96182 2022-05-18T05:20:58.3785318Z dist init r=1, world=2 2022-05-18T05:20:58.3789030Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:20:58.4225808Z dist init r=0, world=2 2022-05-18T05:20:58.4230753Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:20:58.4231680Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:20:58.4299304Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:20:59.8102339Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:20:59.8102929Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:20:59.8314810Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:20:59.8315494Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:20:59.8316344Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:20:59.8316972Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:21:00.4632575Z ok (3.034s) 2022-05-18T05:21:00.4774251Z test_state_dict_rank0_offload_save_load_flow (__main__.TestFSDPStateDict) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 96268 2022-05-18T05:21:00.4882653Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 96269 2022-05-18T05:21:01.4099351Z dist init r=1, world=2 2022-05-18T05:21:01.4102997Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:21:01.4386905Z dist init r=0, world=2 2022-05-18T05:21:01.4391804Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:21:01.4393098Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:21:01.4409526Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:21:02.8241309Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:21:02.8241886Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:21:02.8644459Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:21:02.8645128Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:21:02.8645973Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:21:02.8646883Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:21:03.3958622Z ok (2.932s) 2022-05-18T05:21:03.4090330Z test_state_dict_save_load_flow_state_dict_type_local_state_dict (__main__.TestFSDPStateDict) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 96351 2022-05-18T05:21:03.4201234Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 96352 2022-05-18T05:21:04.3431553Z dist init r=0, world=2 2022-05-18T05:21:04.3434991Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:21:04.3816616Z dist init r=1, world=2 2022-05-18T05:21:04.3821597Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:21:04.3822945Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:21:04.3843445Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:21:05.7665545Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:21:05.7666088Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:21:05.7873045Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:21:05.7873812Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:21:05.7874644Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:21:05.7875291Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:21:06.1041223Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:21:06.1041776Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:21:06.4279473Z ok (3.032s) 2022-05-18T05:21:06.4408856Z test_state_dict_save_load_flow_state_dict_type_sharded_state_dict (__main__.TestFSDPStateDict) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 96438 2022-05-18T05:21:06.4517133Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 96439 2022-05-18T05:21:07.3698447Z dist init r=1, world=2 2022-05-18T05:21:07.3701812Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:21:07.4107990Z dist init r=0, world=2 2022-05-18T05:21:07.4112568Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:21:07.4113935Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:21:07.4212169Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:21:08.7974682Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:21:08.7975242Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:21:08.8194540Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:21:08.8195222Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:21:08.8196413Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:21:08.8197037Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:21:09.1459536Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:21:09.1460153Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:21:09.4596087Z ok (3.031s) 2022-05-18T05:21:09.4727406Z test_state_dict_save_load_flow_state_dict_type_state_dict (__main__.TestFSDPStateDict) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 96525 2022-05-18T05:21:09.4838082Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 96526 2022-05-18T05:21:10.3982901Z dist init r=1, world=2 2022-05-18T05:21:10.3986302Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:21:10.4288935Z dist init r=0, world=2 2022-05-18T05:21:10.4293653Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:21:10.4294692Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:21:10.4395055Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:21:11.8212860Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:21:11.8213394Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:21:11.8435001Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:21:11.8435701Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:21:11.8436540Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:21:11.8437184Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:21:12.1639861Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:21:12.1640392Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:21:12.4918172Z ok (3.032s) 2022-05-18T05:21:12.5079085Z test_state_dict_skip_module_state_dict_type_local_state_dict_double_nest_True (__main__.TestFSDPStateDict) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 96612 2022-05-18T05:21:12.5189645Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 96613 2022-05-18T05:21:13.4867417Z dist init r=1, world=2 2022-05-18T05:21:13.4870803Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:21:13.4877036Z dist init r=0, world=2 2022-05-18T05:21:13.4881659Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:21:13.4883147Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:21:13.4974175Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:21:14.8926248Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:21:14.8926784Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:21:15.5268811Z ok (3.035s) 2022-05-18T05:21:15.5426405Z test_state_dict_skip_module_state_dict_type_sharded_state_dict_double_nest_True (__main__.TestFSDPStateDict) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 96699 2022-05-18T05:21:15.5534927Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 96700 2022-05-18T05:21:16.4773778Z dist init r=1, world=2 2022-05-18T05:21:16.4777238Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:21:16.5253257Z dist init r=0, world=2 2022-05-18T05:21:16.5258001Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:21:16.5258861Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:21:16.5286990Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:21:17.9041322Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:21:17.9042466Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:21:18.5612872Z ok (3.034s) 2022-05-18T05:21:18.5772997Z test_state_dict_skip_module_state_dict_type_state_dict_double_nest_True (__main__.TestFSDPStateDict) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 96786 2022-05-18T05:21:18.5885324Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 96787 2022-05-18T05:21:19.5176454Z dist init r=0, world=2 2022-05-18T05:21:19.5179793Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:21:19.5197636Z dist init r=1, world=2 2022-05-18T05:21:19.5202662Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:21:19.5203600Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:21:19.5283086Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:21:20.8983843Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:21:20.8984370Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:21:21.5965860Z ok (3.035s) 2022-05-18T05:21:21.6104988Z test_state_dict_type (__main__.TestFSDPStateDict) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 96873 2022-05-18T05:21:21.6216569Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 96874 2022-05-18T05:21:22.5398014Z dist init r=0, world=2 2022-05-18T05:21:22.5401684Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:21:22.5469546Z dist init r=1, world=2 2022-05-18T05:21:22.5474426Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:21:22.5475701Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:21:22.5504518Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:21:23.9257931Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:21:23.9258749Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:21:24.2286080Z ok (2.632s) 2022-05-18T05:21:24.2436968Z test_state_dict_with_ignored_modules (__main__.TestFSDPStateDict) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 96956 2022-05-18T05:21:24.2547275Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 96957 2022-05-18T05:21:25.1635444Z dist init r=1, world=2 2022-05-18T05:21:25.1638720Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:21:25.1719341Z dist init r=0, world=2 2022-05-18T05:21:25.1724241Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:21:25.1725544Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:21:25.1741881Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:21:26.5666584Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:21:26.5667136Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:21:26.5875157Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:21:26.5875864Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:21:26.5876701Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:21:26.5877338Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:21:26.8618099Z ok (2.633s) 2022-05-18T05:21:26.8746895Z test_wrong_state_dict_config (__main__.TestFSDPStateDict) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 97039 2022-05-18T05:21:26.8855458Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 97040 2022-05-18T05:21:27.8147229Z dist init r=0, world=2 2022-05-18T05:21:27.8150967Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:21:27.8395871Z dist init r=1, world=2 2022-05-18T05:21:27.8400115Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:21:27.8401159Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:21:27.8458227Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:21:29.2277821Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:21:29.2278376Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:21:29.2473585Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:21:29.2474297Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:21:29.2475153Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:21:29.2475789Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:21:29.5930955Z ok (2.731s) 2022-05-18T05:21:29.5931238Z 2022-05-18T05:21:29.5931647Z ---------------------------------------------------------------------- 2022-05-18T05:21:29.5932012Z Ran 57 tests in 160.386s 2022-05-18T05:21:29.5932188Z 2022-05-18T05:21:29.5932289Z OK 2022-05-18T05:21:29.5932434Z 2022-05-18T05:21:29.5932554Z Generating XML reports... 2022-05-18T05:21:29.6048086Z Generated XML report: test-reports/python-unittest/distributed.fsdp.test_fsdp_state_dict/TEST-TestFSDPStateDict-20220518051849.xml 2022-05-18T05:21:29.8726833Z Running distributed/_shard/sharded_tensor/test_sharded_tensor ... [2022-05-18 05:21:29.872143] 2022-05-18T05:21:29.8727986Z Executing ['/opt/conda/bin/python', 'distributed/_shard/sharded_tensor/test_sharded_tensor.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2022-05-18 05:21:29.872240] 2022-05-18T05:21:30.8137571Z Test results will be stored in test-reports/python-unittest/distributed._shard.sharded_tensor.test_sharded_tensor 2022-05-18T05:21:30.8180808Z 2022-05-18T05:21:30.8181238Z Running tests... 2022-05-18T05:21:30.8181758Z ---------------------------------------------------------------------- 2022-05-18T05:21:32.4744528Z test_empty (__main__.TestCreateTensorFromParams) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:21:32.4872181Z ok (1.669s) 2022-05-18T05:21:32.5126764Z test_local_tensor (__main__.TestLocalTensor) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 97159 2022-05-18T05:21:32.5241601Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 97160 2022-05-18T05:21:32.5355009Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 97161 2022-05-18T05:21:32.5468971Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 97162 2022-05-18T05:21:33.4544433Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:21:33.4591904Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:21:33.4836168Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:21:33.4971078Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:21:33.6514717Z skip: Need at least 4 CUDA devices (1.164s) 2022-05-18T05:21:33.6652640Z test_local_tensor_error (__main__.TestLocalTensor) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 97303 2022-05-18T05:21:33.6764650Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 97304 2022-05-18T05:21:33.6877762Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 97305 2022-05-18T05:21:33.6992210Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 97306 2022-05-18T05:21:34.6934745Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:21:34.6938084Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:21:34.7081694Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:21:34.7236511Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:21:34.9036306Z skip: Need at least 4 CUDA devices (1.252s) 2022-05-18T05:21:34.9173613Z test_collect_local_shard (__main__.TestModuleHookApi) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 97447 2022-05-18T05:21:34.9283044Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 97448 2022-05-18T05:21:34.9395306Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 97449 2022-05-18T05:21:34.9510967Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 97450 2022-05-18T05:21:35.8579058Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:21:35.8844238Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:21:35.9254765Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:21:35.9267135Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:21:36.1554929Z skip: Need at least 4 CUDA devices (1.252s) 2022-05-18T05:21:36.1699595Z test_reshard_output (__main__.TestModuleHookApi) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 97591 2022-05-18T05:21:36.1816280Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 97592 2022-05-18T05:21:36.1931137Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 97593 2022-05-18T05:21:36.2047282Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 97594 2022-05-18T05:21:37.1660070Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:21:37.2170975Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:21:37.2335534Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:21:37.2653432Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:21:37.4091755Z skip: Need at least 4 CUDA devices (1.253s) 2022-05-18T05:21:37.4235182Z test_shard_parameter (__main__.TestShardParameter) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 97735 2022-05-18T05:21:37.4346558Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 97736 2022-05-18T05:21:37.4460010Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 97737 2022-05-18T05:21:37.4573763Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 97738 2022-05-18T05:21:38.3842176Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:21:38.4437928Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:21:38.4486999Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:21:38.4551478Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:21:38.6617331Z skip: Need at least 4 CUDA devices (1.252s) 2022-05-18T05:21:38.6764894Z test_shard_parameter_errors (__main__.TestShardParameter) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 97879 2022-05-18T05:21:38.6877828Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 97880 2022-05-18T05:21:38.6988827Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 97881 2022-05-18T05:21:38.7102047Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 97882 2022-05-18T05:21:39.6121344Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:21:39.6202364Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:21:39.6495481Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:21:39.6744025Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:21:39.8144011Z skip: Need at least 4 CUDA devices (1.152s) 2022-05-18T05:21:39.8284836Z test_shard_tensor (__main__.TestShardTensor) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 98023 2022-05-18T05:21:39.8395960Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 98024 2022-05-18T05:21:39.8509181Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 98025 2022-05-18T05:21:39.8627058Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 98026 2022-05-18T05:21:40.7658875Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:21:40.7790428Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:21:40.8371011Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:21:40.8381593Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:21:41.0670216Z skip: Need at least 4 CUDA devices (1.252s) 2022-05-18T05:21:41.0818174Z test_shard_tensor_errors (__main__.TestShardTensor) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 98167 2022-05-18T05:21:41.0931748Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 98168 2022-05-18T05:21:41.1045313Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 98169 2022-05-18T05:21:41.1158648Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 98170 2022-05-18T05:21:42.0554849Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:21:42.0645272Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:21:42.0844360Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:21:42.0872022Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:21:42.3202450Z skip: Need at least 4 CUDA devices (1.253s) 2022-05-18T05:21:42.3354149Z test_cleanup (__main__.TestShardedTensorChunked) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 98311 2022-05-18T05:21:42.3469336Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 98312 2022-05-18T05:21:42.3587466Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 98313 2022-05-18T05:21:42.3704811Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 98314 2022-05-18T05:21:43.2768121Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:21:43.2842148Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:21:43.2855996Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:21:43.3109118Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:21:43.4748397Z skip: Need at least 4 CUDA devices (1.154s) 2022-05-18T05:21:43.4913815Z test_complete_world_size (__main__.TestShardedTensorChunked) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 98455 2022-05-18T05:21:43.5030261Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 98456 2022-05-18T05:21:43.5149691Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 98457 2022-05-18T05:21:43.5265139Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 98458 2022-05-18T05:21:44.4394095Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:21:44.4421104Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:21:44.4972170Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:21:44.5082673Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:21:44.7309572Z skip: Need at least 4 CUDA devices (1.256s) 2022-05-18T05:21:44.7341708Z test_create_sharded_tensor_like (__main__.TestShardedTensorChunked) 2022-05-18T05:21:44.7475197Z Test tensor like methods, i.e. torch.zeros_like(...), torch.full_like, etc. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 98599 2022-05-18T05:21:44.7588328Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 98600 2022-05-18T05:21:44.7703212Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 98601 2022-05-18T05:21:44.7817680Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 98602 2022-05-18T05:21:45.7545000Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:21:45.7987280Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:21:45.7990223Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:21:45.8251207Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:21:45.9863723Z skip: Need at least 4 CUDA devices (1.255s) 2022-05-18T05:21:45.9879809Z test_create_sharded_tensor_with_full (__main__.TestShardedTensorChunked) 2022-05-18T05:21:46.0004948Z Test sharded_tensor.full(...) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 98743 2022-05-18T05:21:46.0117844Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 98744 2022-05-18T05:21:46.0230149Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 98745 2022-05-18T05:21:46.0345578Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 98746 2022-05-18T05:21:46.9494546Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:21:46.9535267Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:21:46.9558782Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:21:46.9666321Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:21:47.1386255Z skip: Need at least 4 CUDA devices (1.152s) 2022-05-18T05:21:47.1400869Z test_create_sharded_tensor_with_ones (__main__.TestShardedTensorChunked) 2022-05-18T05:21:47.1524665Z Test sharded_tensor.ones(...) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 98887 2022-05-18T05:21:47.1635323Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 98888 2022-05-18T05:21:47.1748985Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 98889 2022-05-18T05:21:47.1863127Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 98890 2022-05-18T05:21:48.1650441Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:21:48.1885344Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:21:48.2036589Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:21:48.2205183Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:21:48.3905834Z skip: Need at least 4 CUDA devices (1.252s) 2022-05-18T05:21:48.3927958Z test_create_sharded_tensor_with_rand (__main__.TestShardedTensorChunked) 2022-05-18T05:21:48.4053167Z Test sharded_tensor.rand(...)/randn(...) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 99031 2022-05-18T05:21:48.4166753Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 99032 2022-05-18T05:21:48.4278961Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 99033 2022-05-18T05:21:48.4393441Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 99034 2022-05-18T05:21:49.3487512Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:21:49.3528549Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:21:49.3837491Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:21:49.3910146Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:21:49.5434826Z skip: Need at least 4 CUDA devices (1.153s) 2022-05-18T05:21:49.5448610Z test_create_sharded_tensor_with_zeros (__main__.TestShardedTensorChunked) 2022-05-18T05:21:49.5573725Z Test sharded_tensor.zeros(...) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 99175 2022-05-18T05:21:49.5682766Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 99176 2022-05-18T05:21:49.5797156Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 99177 2022-05-18T05:21:49.5909851Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 99178 2022-05-18T05:21:50.5248497Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:21:50.5259114Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:21:50.5385515Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:21:50.5510694Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:21:50.7951953Z skip: Need at least 4 CUDA devices (1.252s) 2022-05-18T05:21:50.7965440Z test_gather_even (__main__.TestShardedTensorChunked) 2022-05-18T05:21:50.8088922Z Test _sharded_tensor.gather(...) with evenly distributed._shards ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 99319 2022-05-18T05:21:50.8198850Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 99320 2022-05-18T05:21:50.8312136Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 99321 2022-05-18T05:21:50.8427035Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 99322 2022-05-18T05:21:51.7767058Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:21:51.7798883Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:21:51.7884598Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:21:51.8419902Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:21:52.0469946Z skip: Need at least 4 CUDA devices (1.252s) 2022-05-18T05:21:52.0483721Z test_gather_uneven (__main__.TestShardedTensorChunked) 2022-05-18T05:21:52.0612209Z Test _sharded_tensor.gather(...) with unevenly distributed._shards ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 99463 2022-05-18T05:21:52.0722571Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 99464 2022-05-18T05:21:52.0834887Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 99465 2022-05-18T05:21:52.0947379Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 99466 2022-05-18T05:21:53.0286259Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:21:53.0488889Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:21:53.0895382Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:21:53.0898064Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:21:53.2989240Z skip: Need at least 4 CUDA devices (1.252s) 2022-05-18T05:21:53.3132216Z test_insufficient_sharding_dims (__main__.TestShardedTensorChunked) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 99607 2022-05-18T05:21:53.3242227Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 99608 2022-05-18T05:21:53.3360114Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 99609 2022-05-18T05:21:53.3471720Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 99610 2022-05-18T05:21:54.3305712Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:21:54.3454966Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:21:54.3486011Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:21:54.3595715Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:21:54.5516769Z skip: Need at least 4 CUDA devices (1.253s) 2022-05-18T05:21:54.5654797Z test_invalid_pg_rpc_ranks (__main__.TestShardedTensorChunked) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 99751 2022-05-18T05:21:54.5766992Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 99752 2022-05-18T05:21:54.5882311Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 99753 2022-05-18T05:21:54.5997855Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 99754 2022-05-18T05:21:55.5524945Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:21:55.5762990Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:21:55.5766208Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:21:55.5938977Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:21:55.8042015Z skip: Need at least 4 CUDA devices (1.252s) 2022-05-18T05:21:55.8196270Z test_invalid_sharding (__main__.TestShardedTensorChunked) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 99895 2022-05-18T05:21:55.8308120Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 99896 2022-05-18T05:21:55.8423675Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 99897 2022-05-18T05:21:55.8536106Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 99898 2022-05-18T05:21:56.7544828Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:21:56.7674303Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:21:56.7981306Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:21:56.8266722Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:21:57.0581405Z skip: Need at least 4 CUDA devices (1.254s) 2022-05-18T05:21:57.0734805Z test_load_state_dict_errors (__main__.TestShardedTensorChunked) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 100039 2022-05-18T05:21:57.0852790Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 100040 2022-05-18T05:21:57.0976269Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 100041 2022-05-18T05:21:57.1098139Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 100042 2022-05-18T05:21:58.0373798Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:21:58.0452102Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:21:58.0569816Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:21:58.0717923Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:21:58.3145450Z skip: Need at least 4 CUDA devices (1.256s) 2022-05-18T05:21:58.3305634Z test_multiple_local_shards (__main__.TestShardedTensorChunked) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 100183 2022-05-18T05:21:58.3425617Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 100184 2022-05-18T05:21:58.3550426Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 100185 2022-05-18T05:21:58.3669365Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 100186 2022-05-18T05:21:59.3639831Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:21:59.3688272Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:21:59.3715307Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:21:59.3918751Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:21:59.5713589Z skip: Need at least 4 CUDA devices (1.257s) 2022-05-18T05:21:59.5871072Z test_new_group (__main__.TestShardedTensorChunked) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 100327 2022-05-18T05:21:59.5979838Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 100328 2022-05-18T05:21:59.6097336Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 100329 2022-05-18T05:21:59.6209539Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 100330 2022-05-18T05:22:00.5443125Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:22:00.5948860Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:22:00.6204838Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:22:00.6231426Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:22:00.8253733Z skip: Need at least 4 CUDA devices (1.254s) 2022-05-18T05:22:00.8402878Z test_partial_world_size (__main__.TestShardedTensorChunked) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 100471 2022-05-18T05:22:00.8513984Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 100472 2022-05-18T05:22:00.8629153Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 100473 2022-05-18T05:22:00.8740688Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 100474 2022-05-18T05:22:01.7926809Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:22:01.7927355Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:22:01.7938387Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:22:01.8455947Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:22:02.0783775Z skip: Need at least 4 CUDA devices (1.253s) 2022-05-18T05:22:02.0935254Z test_sharded_tensor_metadata (__main__.TestShardedTensorChunked) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 100615 2022-05-18T05:22:02.1045439Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 100616 2022-05-18T05:22:02.1157793Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 100617 2022-05-18T05:22:02.1272174Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 100618 2022-05-18T05:22:03.0313677Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:22:03.0426220Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:22:03.0484241Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:22:03.0484768Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:22:03.2313042Z skip: Need at least 4 CUDA devices (1.153s) 2022-05-18T05:22:03.2460410Z test_sharded_tensor_sizes (__main__.TestShardedTensorChunked) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 100759 2022-05-18T05:22:03.2568916Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 100760 2022-05-18T05:22:03.2684311Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 100761 2022-05-18T05:22:03.2797511Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 100762 2022-05-18T05:22:04.1943075Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:22:04.1995885Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:22:04.1998914Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:22:04.2141573Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:22:04.3839370Z skip: Need at least 4 CUDA devices (1.152s) 2022-05-18T05:22:04.3979463Z test_sharding_columns (__main__.TestShardedTensorChunked) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 100903 2022-05-18T05:22:04.4087461Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 100904 2022-05-18T05:22:04.4200275Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 100905 2022-05-18T05:22:04.4313291Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 100906 2022-05-18T05:22:05.3802382Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:22:05.3850971Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:22:05.4028718Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:22:05.4259818Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:22:05.6357771Z skip: Need at least 4 CUDA devices (1.252s) 2022-05-18T05:22:05.6502038Z test_state_dict (__main__.TestShardedTensorChunked) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 101047 2022-05-18T05:22:05.6616621Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 101048 2022-05-18T05:22:05.6736042Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 101049 2022-05-18T05:22:05.6849956Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 101050 2022-05-18T05:22:06.5982543Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:22:06.6017694Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:22:06.6201000Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:22:06.6204298Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:22:06.7893095Z skip: Need at least 4 CUDA devices (1.153s) 2022-05-18T05:22:06.8035609Z test_state_dict_new_group (__main__.TestShardedTensorChunked) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 101191 2022-05-18T05:22:06.8146684Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 101192 2022-05-18T05:22:06.8261027Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 101193 2022-05-18T05:22:06.8376580Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 101194 2022-05-18T05:22:07.7486466Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:22:07.7759212Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:22:07.8165510Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:22:07.8323647Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:22:08.0422872Z skip: Need at least 4 CUDA devices (1.253s) 2022-05-18T05:22:08.0561099Z test_state_dict_no_sharded_tensors (__main__.TestShardedTensorChunked) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 101335 2022-05-18T05:22:08.0673366Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 101336 2022-05-18T05:22:08.0790712Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 101337 2022-05-18T05:22:08.0905434Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 101338 2022-05-18T05:22:09.0469715Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:22:09.0643366Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:22:09.0765975Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:22:09.0788813Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:22:09.2950727Z skip: Need at least 4 CUDA devices (1.253s) 2022-05-18T05:22:09.3092027Z test_custom_op (__main__.TestShardedTensorCustomOps) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 101479 2022-05-18T05:22:09.3206525Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 101480 2022-05-18T05:22:09.3322940Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 101481 2022-05-18T05:22:09.3438214Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 101482 2022-05-18T05:22:10.2606245Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:22:10.2611042Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:22:10.2819097Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:22:10.2840187Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:22:10.4479173Z skip: Need at least 4 CUDA devices (1.153s) 2022-05-18T05:22:10.4613178Z test_custom_op_errors (__main__.TestShardedTensorCustomOps) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 101623 2022-05-18T05:22:10.4724101Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 101624 2022-05-18T05:22:10.4844017Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 101625 2022-05-18T05:22:10.4960019Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 101626 2022-05-18T05:22:11.4055583Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:22:11.4489005Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:22:11.4703753Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:22:11.4832011Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:22:11.7004073Z skip: Need at least 4 CUDA devices (1.252s) 2022-05-18T05:22:11.7144255Z test_custom_op_override (__main__.TestShardedTensorCustomOps) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 101767 2022-05-18T05:22:11.7259072Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 101768 2022-05-18T05:22:11.7375260Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 101769 2022-05-18T05:22:11.7489734Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 101770 2022-05-18T05:22:12.7266407Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:22:12.7483848Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:22:12.7504480Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:22:12.7580052Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:22:12.9533974Z skip: Need at least 4 CUDA devices (1.253s) 2022-05-18T05:22:12.9552844Z test_create_sharded_tensor_with_ones (__main__.TestShardedTensorEnumerable) 2022-05-18T05:22:12.9676434Z Test sharded_tensor.ones(...) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 101911 2022-05-18T05:22:12.9787183Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 101912 2022-05-18T05:22:12.9902135Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 101913 2022-05-18T05:22:13.0014872Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 101914 2022-05-18T05:22:13.9204616Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:22:13.9637083Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:22:13.9695481Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:22:13.9701066Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:22:14.2061135Z skip: Need at least 4 CUDA devices (1.252s) 2022-05-18T05:22:14.2079662Z test_gather_even (__main__.TestShardedTensorEnumerable) 2022-05-18T05:22:14.2208610Z Test _sharded_tensor.gather(...) with evenly distributed._shards ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 102055 2022-05-18T05:22:14.2322608Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 102056 2022-05-18T05:22:14.2438372Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 102057 2022-05-18T05:22:14.2553229Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 102058 2022-05-18T05:22:15.1930894Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:22:15.2012198Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:22:15.2116078Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:22:15.2218717Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:22:15.4598727Z skip: Need at least 4 CUDA devices (1.253s) 2022-05-18T05:22:15.4618109Z test_gather_uneven (__main__.TestShardedTensorEnumerable) 2022-05-18T05:22:15.4748096Z Test _sharded_tensor.gather(...) with unevenly distributed._shards ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 102199 2022-05-18T05:22:15.4860838Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 102200 2022-05-18T05:22:15.4978161Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 102201 2022-05-18T05:22:15.5094661Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 102202 2022-05-18T05:22:16.4203879Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:22:16.4260539Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:22:16.4613225Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:22:16.4836675Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:22:16.7138848Z skip: Need at least 4 CUDA devices (1.254s) 2022-05-18T05:22:16.7299843Z test_grid_sharding (__main__.TestShardedTensorEnumerable) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 102343 2022-05-18T05:22:16.7414536Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 102344 2022-05-18T05:22:16.7531780Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 102345 2022-05-18T05:22:16.7646854Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 102346 2022-05-18T05:22:17.6880126Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:22:17.7471813Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:22:17.7504947Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:22:17.7544442Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:22:17.9692227Z skip: Need at least 4 CUDA devices (1.255s) 2022-05-18T05:22:17.9856960Z test_multiple_local_shards (__main__.TestShardedTensorEnumerable) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 102487 2022-05-18T05:22:17.9971031Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 102488 2022-05-18T05:22:18.0090808Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 102489 2022-05-18T05:22:18.0205133Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 102490 2022-05-18T05:22:19.0090962Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:22:19.0305440Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:22:19.0548008Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:22:19.0861881Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:22:19.2249066Z skip: Need at least 4 CUDA devices (1.256s) 2022-05-18T05:22:19.2405507Z test_new_group (__main__.TestShardedTensorEnumerable) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 102631 2022-05-18T05:22:19.2517395Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 102632 2022-05-18T05:22:19.2631564Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 102633 2022-05-18T05:22:19.2745038Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 102634 2022-05-18T05:22:20.2622896Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:22:20.2828576Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:22:20.2829101Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:22:20.2838079Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:22:20.4787620Z skip: Need at least 4 CUDA devices (1.254s) 2022-05-18T05:22:20.4941498Z test_partial_world_size (__main__.TestShardedTensorEnumerable) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 102775 2022-05-18T05:22:20.5050787Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 102776 2022-05-18T05:22:20.5166051Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 102777 2022-05-18T05:22:20.5279201Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 102778 2022-05-18T05:22:21.4629681Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:22:21.4846195Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:22:21.5060996Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:22:21.5101836Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:22:21.7323634Z skip: Need at least 4 CUDA devices (1.253s) 2022-05-18T05:22:21.7478231Z test_sharded_tensor_metadata (__main__.TestShardedTensorEnumerable) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 102919 2022-05-18T05:22:21.7589866Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 102920 2022-05-18T05:22:21.7702971Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 102921 2022-05-18T05:22:21.7818151Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 102922 2022-05-18T05:22:22.7477078Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:22:22.7659425Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:22:22.7861300Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:22:22.8425379Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:22:22.9862315Z skip: Need at least 4 CUDA devices (1.254s) 2022-05-18T05:22:23.0024264Z test_sharded_tensor_to_cpu (__main__.TestShardedTensorEnumerable) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 103063 2022-05-18T05:22:23.0138157Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 103064 2022-05-18T05:22:23.0261524Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 103065 2022-05-18T05:22:23.0380036Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 103066 2022-05-18T05:22:23.9510885Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:22:23.9531899Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:22:23.9578167Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:22:24.0187644Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:22:24.2423109Z skip: Need at least 4 CUDA devices (1.256s) 2022-05-18T05:22:24.2582171Z test_uneven_shards (__main__.TestShardedTensorEnumerable) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 103207 2022-05-18T05:22:24.2694205Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 103208 2022-05-18T05:22:24.2808598Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 103209 2022-05-18T05:22:24.2926066Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 103210 2022-05-18T05:22:25.2061077Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:22:25.2144805Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:22:25.2158249Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:22:25.2562088Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:22:25.4969611Z skip: Need at least 4 CUDA devices (1.254s) 2022-05-18T05:22:25.5133601Z test_with_rpc_names (__main__.TestShardedTensorEnumerable) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 103351 2022-05-18T05:22:25.5250625Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 103352 2022-05-18T05:22:25.5370999Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 103353 2022-05-18T05:22:25.5485972Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 103354 2022-05-18T05:22:26.4575720Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:22:26.4851662Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:22:26.5295708Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:22:26.5442915Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:22:26.7530951Z skip: Need at least 4 CUDA devices (1.256s) 2022-05-18T05:22:26.7689929Z test_init_from_local_shards (__main__.TestShardedTensorFromLocalShards) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 103495 2022-05-18T05:22:26.7803287Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 103496 2022-05-18T05:22:26.7917908Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 103497 2022-05-18T05:22:26.8034446Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 103498 2022-05-18T05:22:27.7107650Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:22:27.7128886Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:22:27.7288643Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:22:27.7298110Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:22:27.9076938Z skip: Need at least 4 CUDA devices (1.154s) 2022-05-18T05:22:27.9239015Z test_init_from_local_shards_and_global_metadata (__main__.TestShardedTensorFromLocalShards) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 103639 2022-05-18T05:22:27.9352153Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 103640 2022-05-18T05:22:27.9468313Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 103641 2022-05-18T05:22:27.9583003Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 103642 2022-05-18T05:22:28.8759635Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:22:28.9132861Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:22:28.9143002Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:22:28.9214772Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:22:29.1625306Z skip: Need at least 4 CUDA devices (1.255s) 2022-05-18T05:22:29.1789963Z test_init_from_local_shards_and_global_metadata_invalid_shards (__main__.TestShardedTensorFromLocalShards) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 103783 2022-05-18T05:22:29.1900560Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 103784 2022-05-18T05:22:29.2012502Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 103785 2022-05-18T05:22:29.2124926Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 103786 2022-05-18T05:22:30.1089029Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:22:30.1218195Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:22:30.1417517Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:22:30.1859343Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:22:30.4169014Z skip: Need at least 4 CUDA devices (1.254s) 2022-05-18T05:22:30.4318884Z test_init_from_local_shards_invalid_local_shards (__main__.TestShardedTensorFromLocalShards) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 103927 2022-05-18T05:22:30.4431587Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 103928 2022-05-18T05:22:30.4550007Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 103929 2022-05-18T05:22:30.4663407Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 103930 2022-05-18T05:22:31.3655373Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:22:31.3751441Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:22:31.3873375Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:22:31.4140819Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:22:31.5707503Z skip: Need at least 4 CUDA devices (1.154s) 2022-05-18T05:22:31.5851637Z test_init_from_local_shards_invalid_pin_memory (__main__.TestShardedTensorFromLocalShards) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 104071 2022-05-18T05:22:31.5961150Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 104072 2022-05-18T05:22:31.6077422Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 104073 2022-05-18T05:22:31.6192766Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 104074 2022-05-18T05:22:32.5397244Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:22:32.5454949Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:22:32.5862565Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:22:32.5880970Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:22:32.6014716Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:22:32.6218580Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:22:32.6220753Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 2 2022-05-18T05:22:32.6221313Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 3 2022-05-18T05:22:32.6222107Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2022-05-18T05:22:32.6222805Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2022-05-18T05:22:32.6223504Z INFO:torch.distributed.distributed_c10d:Rank 3: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2022-05-18T05:22:32.6224207Z INFO:torch.distributed.distributed_c10d:Rank 2: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2022-05-18T05:22:32.8235530Z skip: Need at least 4 CUDA devices (1.253s) 2022-05-18T05:22:32.8383358Z test_init_from_local_shards_invalid_property_cross_ranks (__main__.TestShardedTensorFromLocalShards) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 104227 2022-05-18T05:22:32.8494023Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 104228 2022-05-18T05:22:32.8610105Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 104229 2022-05-18T05:22:32.8725946Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 104230 2022-05-18T05:22:33.7929130Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:22:33.7947753Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:22:33.8050087Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:22:33.8122792Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:22:33.9765559Z skip: Need at least 4 CUDA devices (1.153s) 2022-05-18T05:22:33.9900799Z test_init_from_local_shards_invalid_shards_gaps (__main__.TestShardedTensorFromLocalShards) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 104371 2022-05-18T05:22:34.0010523Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 104372 2022-05-18T05:22:34.0123089Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 104373 2022-05-18T05:22:34.0235966Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 104374 2022-05-18T05:22:34.9688116Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:22:34.9923112Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:22:35.0020407Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:22:35.0425515Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:22:35.2279711Z skip: Need at least 4 CUDA devices (1.251s) 2022-05-18T05:22:35.2426769Z test_init_from_local_shards_invalid_shards_overlap (__main__.TestShardedTensorFromLocalShards) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 104515 2022-05-18T05:22:35.2537729Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 104516 2022-05-18T05:22:35.2656313Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 104517 2022-05-18T05:22:35.2768216Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 104518 2022-05-18T05:22:36.2411359Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:22:36.2839969Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:22:36.2993173Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:22:36.3071355Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:22:36.4813495Z skip: Need at least 4 CUDA devices (1.253s) 2022-05-18T05:22:36.4958436Z test_init_from_local_shards_new_group (__main__.TestShardedTensorFromLocalShards) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 104659 2022-05-18T05:22:36.5067506Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 104660 2022-05-18T05:22:36.5179253Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 104661 2022-05-18T05:22:36.5295119Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 104662 2022-05-18T05:22:37.4812447Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:22:37.5060823Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:22:37.5169949Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:22:37.5176551Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:22:37.7338666Z skip: Need at least 4 CUDA devices (1.252s) 2022-05-18T05:22:37.7481357Z test_local_shards (__main__.TestShardedTensorFromLocalShards) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 104803 2022-05-18T05:22:37.7593844Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 104804 2022-05-18T05:22:37.7714381Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 104805 2022-05-18T05:22:37.7828248Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 104806 2022-05-18T05:22:38.7342118Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:22:38.7528168Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:22:38.7664926Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:22:38.7720494Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:22:38.9871548Z skip: Need at least 4 CUDA devices (1.253s) 2022-05-18T05:22:39.0008071Z test_init_from_local_tensor (__main__.TestShardedTensorFromLocalTensor) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 104947 2022-05-18T05:22:39.0118722Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 104948 2022-05-18T05:22:39.0233706Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 104949 2022-05-18T05:22:39.0347126Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 104950 2022-05-18T05:22:39.9533736Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:22:39.9566488Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:22:39.9770725Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:22:39.9780812Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:22:40.1387341Z skip: Need at least 4 CUDA devices (1.151s) 2022-05-18T05:22:40.1526421Z test_init_from_local_tensor_errors (__main__.TestShardedTensorFromLocalTensor) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 105091 2022-05-18T05:22:40.1636961Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 105092 2022-05-18T05:22:40.1750897Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 105093 2022-05-18T05:22:40.1866157Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 105094 2022-05-18T05:22:41.0952697Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:22:41.1033160Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:22:41.1084662Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:22:41.1569733Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:22:41.3910198Z skip: Need at least 4 CUDA devices (1.252s) 2022-05-18T05:22:41.4500788Z test_serialize_and_deserialize (__main__.TestShardedTensorMetadata) ... ok (0.059s) 2022-05-18T05:22:41.4501972Z 2022-05-18T05:22:41.4502408Z ---------------------------------------------------------------------- 2022-05-18T05:22:41.4502784Z Ran 58 tests in 70.632s 2022-05-18T05:22:41.4502960Z 2022-05-18T05:22:41.4503080Z OK (skipped=56) 2022-05-18T05:22:41.4503249Z 2022-05-18T05:22:41.4503362Z Generating XML reports... 2022-05-18T05:22:41.4545627Z Generated XML report: test-reports/python-unittest/distributed._shard.sharded_tensor.test_sharded_tensor/TEST-TestCreateTensorFromParams-20220518052130.xml 2022-05-18T05:22:41.4549133Z Generated XML report: test-reports/python-unittest/distributed._shard.sharded_tensor.test_sharded_tensor/TEST-TestShardedTensorMetadata-20220518052130.xml 2022-05-18T05:22:41.4554125Z Generated XML report: test-reports/python-unittest/distributed._shard.sharded_tensor.test_sharded_tensor/TEST-TestLocalTensor-20220518052130.xml 2022-05-18T05:22:41.4560192Z Generated XML report: test-reports/python-unittest/distributed._shard.sharded_tensor.test_sharded_tensor/TEST-TestModuleHookApi-20220518052130.xml 2022-05-18T05:22:41.4565415Z Generated XML report: test-reports/python-unittest/distributed._shard.sharded_tensor.test_sharded_tensor/TEST-TestShardParameter-20220518052130.xml 2022-05-18T05:22:41.4570732Z Generated XML report: test-reports/python-unittest/distributed._shard.sharded_tensor.test_sharded_tensor/TEST-TestShardTensor-20220518052130.xml 2022-05-18T05:22:41.4604574Z Generated XML report: test-reports/python-unittest/distributed._shard.sharded_tensor.test_sharded_tensor/TEST-TestShardedTensorChunked-20220518052130.xml 2022-05-18T05:22:41.4610977Z Generated XML report: test-reports/python-unittest/distributed._shard.sharded_tensor.test_sharded_tensor/TEST-TestShardedTensorCustomOps-20220518052130.xml 2022-05-18T05:22:41.4629294Z Generated XML report: test-reports/python-unittest/distributed._shard.sharded_tensor.test_sharded_tensor/TEST-TestShardedTensorEnumerable-20220518052130.xml 2022-05-18T05:22:41.4645436Z Generated XML report: test-reports/python-unittest/distributed._shard.sharded_tensor.test_sharded_tensor/TEST-TestShardedTensorFromLocalShards-20220518052130.xml 2022-05-18T05:22:41.4650415Z Generated XML report: test-reports/python-unittest/distributed._shard.sharded_tensor.test_sharded_tensor/TEST-TestShardedTensorFromLocalTensor-20220518052130.xml 2022-05-18T05:22:41.7288266Z Running distributed/test_c10d_spawn_gloo ... [2022-05-18 05:22:41.728310] 2022-05-18T05:22:41.7289040Z Executing ['/opt/conda/bin/python', 'distributed/test_c10d_spawn_gloo.py', '-v', '--subprocess', '--import-slow-tests', '--import-disabled-tests'] ... [2022-05-18 05:22:41.728411] 2022-05-18T05:22:42.6378322Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpaul3hbdb 2022-05-18T05:22:42.6379432Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpaul3hbdb/_remote_module_non_scriptable.py 2022-05-18T05:22:44.2833891Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:22:44.2849118Z , <__main__.DistributedDataParallelSingleProcessTest testMethod=test_cuda>, <__main__.DistributedDataParallelSingleProcessTest testMethod=test_rnn>]> 2022-05-18T05:22:44.2850985Z test_cpu (__main__.DistributedDataParallelSingleProcessTest) 2022-05-18T05:22:44.2852025Z test_cuda (__main__.DistributedDataParallelSingleProcessTest) 2022-05-18T05:22:44.2852641Z test_rnn (__main__.DistributedDataParallelSingleProcessTest) 2022-05-18T05:22:44.2853568Z , <__main__.ProcessGroupShareTensorTest testMethod=test_shared_allgather_gloo>, <__main__.ProcessGroupShareTensorTest testMethod=test_shared_allreduce_gloo>, <__main__.ProcessGroupShareTensorTest testMethod=test_shared_broadcast_gloo>]> 2022-05-18T05:22:44.2854801Z test_shared_allgather_chunk_gloo (__main__.ProcessGroupShareTensorTest) 2022-05-18T05:22:44.2855580Z test_shared_allgather_gloo (__main__.ProcessGroupShareTensorTest) 2022-05-18T05:22:44.2856262Z test_shared_allreduce_gloo (__main__.ProcessGroupShareTensorTest) 2022-05-18T05:22:44.2856777Z test_shared_broadcast_gloo (__main__.ProcessGroupShareTensorTest) 2022-05-18T05:22:44.2857170Z 2022-05-18T05:22:44.2857500Z 2022-05-18T05:22:44.2858642Z , <__main__.TestDistributedNNFunctionsGloo testMethod=test_all_to_all>, <__main__.TestDistributedNNFunctionsGloo testMethod=test_all_to_all_single>, <__main__.TestDistributedNNFunctionsGloo testMethod=test_allreduce>, <__main__.TestDistributedNNFunctionsGloo testMethod=test_broadcast>, <__main__.TestDistributedNNFunctionsGloo testMethod=test_gather>, <__main__.TestDistributedNNFunctionsGloo testMethod=test_reduce>, <__main__.TestDistributedNNFunctionsGloo testMethod=test_scatter>]> 2022-05-18T05:22:44.2859820Z test_all_gather (__main__.TestDistributedNNFunctionsGloo) 2022-05-18T05:22:44.2860203Z test_all_to_all (__main__.TestDistributedNNFunctionsGloo) 2022-05-18T05:22:44.2860606Z test_all_to_all_single (__main__.TestDistributedNNFunctionsGloo) 2022-05-18T05:22:44.2861017Z test_allreduce (__main__.TestDistributedNNFunctionsGloo) 2022-05-18T05:22:44.2861406Z test_broadcast (__main__.TestDistributedNNFunctionsGloo) 2022-05-18T05:22:44.2861805Z test_gather (__main__.TestDistributedNNFunctionsGloo) 2022-05-18T05:22:44.2862228Z test_reduce (__main__.TestDistributedNNFunctionsGloo) 2022-05-18T05:22:44.2862598Z test_scatter (__main__.TestDistributedNNFunctionsGloo) 2022-05-18T05:22:45.1820016Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpilljlkex 2022-05-18T05:22:45.1821132Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpilljlkex/_remote_module_non_scriptable.py 2022-05-18T05:22:46.8277032Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:22:46.8311811Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_spawn_gloo 2022-05-18T05:22:46.8326875Z 2022-05-18T05:22:46.8327252Z Running tests... 2022-05-18T05:22:46.8327760Z ---------------------------------------------------------------------- 2022-05-18T05:22:46.8403902Z test_cpu (__main__.DistributedDataParallelSingleProcessTest) ... INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:22:46.8554995Z ok (0.023s) 2022-05-18T05:22:46.8556437Z 2022-05-18T05:22:46.8556826Z ---------------------------------------------------------------------- 2022-05-18T05:22:46.8557171Z Ran 1 test in 0.023s 2022-05-18T05:22:46.8557344Z 2022-05-18T05:22:46.8557442Z OK 2022-05-18T05:22:46.8557581Z 2022-05-18T05:22:46.8557694Z Generating XML reports... 2022-05-18T05:22:46.8588446Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_spawn_gloo/TEST-DistributedDataParallelSingleProcessTest-20220518052246.xml 2022-05-18T05:22:47.9778717Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpu7ly3uop 2022-05-18T05:22:47.9779627Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpu7ly3uop/_remote_module_non_scriptable.py 2022-05-18T05:22:49.6300483Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:22:49.6335992Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_spawn_gloo 2022-05-18T05:22:49.6351176Z 2022-05-18T05:22:49.6351366Z Running tests... 2022-05-18T05:22:49.6351801Z ---------------------------------------------------------------------- 2022-05-18T05:22:49.9320094Z test_cuda (__main__.DistributedDataParallelSingleProcessTest) ... INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:22:49.9521269Z ok (0.317s) 2022-05-18T05:22:49.9523085Z 2022-05-18T05:22:49.9523753Z ---------------------------------------------------------------------- 2022-05-18T05:22:49.9524116Z Ran 1 test in 0.317s 2022-05-18T05:22:49.9524303Z 2022-05-18T05:22:49.9524408Z OK 2022-05-18T05:22:49.9524548Z 2022-05-18T05:22:49.9524682Z Generating XML reports... 2022-05-18T05:22:49.9559070Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_spawn_gloo/TEST-DistributedDataParallelSingleProcessTest-20220518052249.xml 2022-05-18T05:22:51.1026919Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmppeqistnt 2022-05-18T05:22:51.1027803Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmppeqistnt/_remote_module_non_scriptable.py 2022-05-18T05:22:52.7481980Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:22:52.7517751Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_spawn_gloo 2022-05-18T05:22:52.7533441Z 2022-05-18T05:22:52.7533809Z Running tests... 2022-05-18T05:22:52.7534323Z ---------------------------------------------------------------------- 2022-05-18T05:22:53.3128756Z test_rnn (__main__.DistributedDataParallelSingleProcessTest) ... Could not load symbol cublasGetSmCountTarget from libcublas.so.11. Error: /usr/local/cuda/lib64/libcublas.so.11: undefined symbol: cublasGetSmCountTarget 2022-05-18T05:22:53.4211797Z INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. 2022-05-18T05:22:53.9254289Z ok (1.172s) 2022-05-18T05:22:53.9255103Z 2022-05-18T05:22:53.9255577Z ---------------------------------------------------------------------- 2022-05-18T05:22:53.9255922Z Ran 1 test in 1.172s 2022-05-18T05:22:53.9256096Z 2022-05-18T05:22:53.9256195Z OK 2022-05-18T05:22:53.9256335Z 2022-05-18T05:22:53.9256471Z Generating XML reports... 2022-05-18T05:22:53.9291478Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_spawn_gloo/TEST-DistributedDataParallelSingleProcessTest-20220518052252.xml 2022-05-18T05:22:55.1823186Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmptl9zm3ua 2022-05-18T05:22:55.1824326Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmptl9zm3ua/_remote_module_non_scriptable.py 2022-05-18T05:22:56.8295814Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:22:56.8331651Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_spawn_gloo 2022-05-18T05:22:56.8347890Z 2022-05-18T05:22:56.8348266Z Running tests... 2022-05-18T05:22:56.8348790Z ---------------------------------------------------------------------- 2022-05-18T05:22:57.7952193Z test_shared_allgather_chunk_gloo (__main__.ProcessGroupShareTensorTest) ... INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpbtv5vx15 2022-05-18T05:22:57.7952923Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpbtv5vx15/_remote_module_non_scriptable.py 2022-05-18T05:22:57.8167407Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp36j9uqve 2022-05-18T05:22:57.8170119Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp36j9uqve/_remote_module_non_scriptable.py 2022-05-18T05:22:59.4877766Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:22:59.5107656Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:22:59.8394167Z ok (3.004s) 2022-05-18T05:22:59.8395064Z 2022-05-18T05:22:59.8395487Z ---------------------------------------------------------------------- 2022-05-18T05:22:59.8395830Z Ran 1 test in 3.005s 2022-05-18T05:22:59.8396006Z 2022-05-18T05:22:59.8396106Z OK 2022-05-18T05:22:59.8396251Z 2022-05-18T05:22:59.8396654Z Generating XML reports... 2022-05-18T05:22:59.8437451Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_spawn_gloo/TEST-ProcessGroupShareTensorTest-20220518052256.xml 2022-05-18T05:23:01.0139446Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpm7oh3pr8 2022-05-18T05:23:01.0140573Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpm7oh3pr8/_remote_module_non_scriptable.py 2022-05-18T05:23:02.6805776Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:23:02.6841797Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_spawn_gloo 2022-05-18T05:23:02.6858043Z 2022-05-18T05:23:02.6858299Z Running tests... 2022-05-18T05:23:02.6858740Z ---------------------------------------------------------------------- 2022-05-18T05:23:04.6410354Z test_shared_allgather_gloo (__main__.ProcessGroupShareTensorTest) ... INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp5lefbmy6 2022-05-18T05:23:04.6411480Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp5lefbmy6/_remote_module_non_scriptable.py 2022-05-18T05:23:04.6425261Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpapn5yey8 2022-05-18T05:23:04.6428104Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpapn5yey8/_remote_module_non_scriptable.py 2022-05-18T05:23:06.3378737Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:23:06.3700192Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:23:07.8672348Z ok (5.181s) 2022-05-18T05:23:07.8673382Z 2022-05-18T05:23:07.8674138Z ---------------------------------------------------------------------- 2022-05-18T05:23:07.8674477Z Ran 1 test in 5.181s 2022-05-18T05:23:07.8674648Z 2022-05-18T05:23:07.8674747Z OK 2022-05-18T05:23:07.8674887Z 2022-05-18T05:23:07.8677230Z Generating XML reports... 2022-05-18T05:23:07.8716696Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_spawn_gloo/TEST-ProcessGroupShareTensorTest-20220518052302.xml 2022-05-18T05:23:09.1112741Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpoxsb96qq 2022-05-18T05:23:09.1113652Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpoxsb96qq/_remote_module_non_scriptable.py 2022-05-18T05:23:10.7460791Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:23:10.7496415Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_spawn_gloo 2022-05-18T05:23:10.7511968Z 2022-05-18T05:23:10.7512088Z Running tests... 2022-05-18T05:23:10.7512761Z ---------------------------------------------------------------------- 2022-05-18T05:23:12.6825474Z test_shared_allreduce_gloo (__main__.ProcessGroupShareTensorTest) ... INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpvvywpa3b 2022-05-18T05:23:12.6826509Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpvvywpa3b/_remote_module_non_scriptable.py 2022-05-18T05:23:12.7172458Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp193gu8ha 2022-05-18T05:23:12.7175386Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp193gu8ha/_remote_module_non_scriptable.py 2022-05-18T05:23:14.3802056Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:23:14.4178714Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:23:15.9248894Z ok (5.173s) 2022-05-18T05:23:15.9249662Z 2022-05-18T05:23:15.9250085Z ---------------------------------------------------------------------- 2022-05-18T05:23:15.9250739Z Ran 1 test in 5.174s 2022-05-18T05:23:15.9250892Z 2022-05-18T05:23:15.9251275Z OK 2022-05-18T05:23:15.9251418Z 2022-05-18T05:23:15.9251556Z Generating XML reports... 2022-05-18T05:23:15.9293271Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_spawn_gloo/TEST-ProcessGroupShareTensorTest-20220518052310.xml 2022-05-18T05:23:17.1701444Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpjjnmn9db 2022-05-18T05:23:17.1702666Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpjjnmn9db/_remote_module_non_scriptable.py 2022-05-18T05:23:18.8036699Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:23:18.8071264Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_spawn_gloo 2022-05-18T05:23:18.8086787Z 2022-05-18T05:23:18.8086926Z Running tests... 2022-05-18T05:23:18.8087363Z ---------------------------------------------------------------------- 2022-05-18T05:23:20.7181033Z test_shared_broadcast_gloo (__main__.ProcessGroupShareTensorTest) ... INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp7bucrdkk 2022-05-18T05:23:20.7181759Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp7bucrdkk/_remote_module_non_scriptable.py 2022-05-18T05:23:20.7298041Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp0afecb7a 2022-05-18T05:23:20.7300895Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp0afecb7a/_remote_module_non_scriptable.py 2022-05-18T05:23:22.4115973Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:23:22.4398049Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:23:23.9460269Z ok (5.137s) 2022-05-18T05:23:23.9461457Z 2022-05-18T05:23:23.9461897Z ---------------------------------------------------------------------- 2022-05-18T05:23:23.9462512Z Ran 1 test in 5.137s 2022-05-18T05:23:23.9462683Z 2022-05-18T05:23:23.9462782Z OK 2022-05-18T05:23:23.9462924Z 2022-05-18T05:23:23.9463067Z Generating XML reports... 2022-05-18T05:23:23.9503697Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_spawn_gloo/TEST-ProcessGroupShareTensorTest-20220518052318.xml 2022-05-18T05:23:25.1861365Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpw4qkn3j5 2022-05-18T05:23:25.1862670Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpw4qkn3j5/_remote_module_non_scriptable.py 2022-05-18T05:23:26.8367536Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:23:26.8403073Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_spawn_gloo 2022-05-18T05:23:26.8419260Z 2022-05-18T05:23:26.8419402Z Running tests... 2022-05-18T05:23:26.8420038Z ---------------------------------------------------------------------- 2022-05-18T05:23:26.8785249Z test_all_gather (__main__.TestDistributedNNFunctionsGloo) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 105920 2022-05-18T05:23:26.8887177Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 105921 2022-05-18T05:23:27.7759402Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp3jetae0w 2022-05-18T05:23:27.7760743Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp3jetae0w/_remote_module_non_scriptable.py 2022-05-18T05:23:27.7793788Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpqv9k2_7e 2022-05-18T05:23:27.7797594Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpqv9k2_7e/_remote_module_non_scriptable.py 2022-05-18T05:23:29.4695721Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:23:29.4707979Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:23:29.4819867Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:23:29.4835007Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:23:29.5020305Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:23:29.5021279Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:23:29.5023062Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:23:29.5024248Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:23:30.8993860Z ok (4.057s) 2022-05-18T05:23:30.8994166Z 2022-05-18T05:23:30.8995017Z ---------------------------------------------------------------------- 2022-05-18T05:23:30.8995401Z Ran 1 test in 4.057s 2022-05-18T05:23:30.8995580Z 2022-05-18T05:23:30.8995676Z OK 2022-05-18T05:23:30.8995824Z 2022-05-18T05:23:30.8995968Z Generating XML reports... 2022-05-18T05:23:30.9038124Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_spawn_gloo/TEST-TestDistributedNNFunctionsGloo-20220518052326.xml 2022-05-18T05:23:32.0750119Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpblgn7h9y 2022-05-18T05:23:32.0751500Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpblgn7h9y/_remote_module_non_scriptable.py 2022-05-18T05:23:33.6823215Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:23:33.6857674Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_spawn_gloo 2022-05-18T05:23:33.6872594Z 2022-05-18T05:23:33.6873080Z Running tests... 2022-05-18T05:23:33.6873617Z ---------------------------------------------------------------------- 2022-05-18T05:23:33.7231180Z test_all_to_all (__main__.TestDistributedNNFunctionsGloo) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 106042 2022-05-18T05:23:33.7331851Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 106043 2022-05-18T05:23:34.6336133Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp4k8jhlh3 2022-05-18T05:23:34.6336999Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp4k8jhlh3/_remote_module_non_scriptable.py 2022-05-18T05:23:34.6395980Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp1hale2en 2022-05-18T05:23:34.6398702Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp1hale2en/_remote_module_non_scriptable.py 2022-05-18T05:23:36.3351396Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:23:36.3363807Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:23:36.3557841Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:23:36.3572030Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:23:36.3777388Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:23:36.3777911Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:23:36.3778713Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:23:36.3779404Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:23:37.7439056Z ok (4.056s) 2022-05-18T05:23:37.7439416Z 2022-05-18T05:23:37.7440182Z ---------------------------------------------------------------------- 2022-05-18T05:23:37.7440678Z Ran 1 test in 4.057s 2022-05-18T05:23:37.7440850Z 2022-05-18T05:23:37.7440952Z OK 2022-05-18T05:23:37.7441353Z 2022-05-18T05:23:37.7441644Z Generating XML reports... 2022-05-18T05:23:37.7485793Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_spawn_gloo/TEST-TestDistributedNNFunctionsGloo-20220518052333.xml 2022-05-18T05:23:38.9359095Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpzfjfjb9c 2022-05-18T05:23:38.9361161Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpzfjfjb9c/_remote_module_non_scriptable.py 2022-05-18T05:23:40.6104744Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:23:40.6141088Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_spawn_gloo 2022-05-18T05:23:40.6157497Z 2022-05-18T05:23:40.6157717Z Running tests... 2022-05-18T05:23:40.6158137Z ---------------------------------------------------------------------- 2022-05-18T05:23:40.6524258Z test_all_to_all_single (__main__.TestDistributedNNFunctionsGloo) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 106164 2022-05-18T05:23:40.6625882Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 106165 2022-05-18T05:23:41.5690706Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpz7yre6sf 2022-05-18T05:23:41.5692038Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpz7yre6sf/_remote_module_non_scriptable.py 2022-05-18T05:23:41.5902233Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpaz8oyz15 2022-05-18T05:23:41.5904925Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpaz8oyz15/_remote_module_non_scriptable.py 2022-05-18T05:23:43.2909499Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:23:43.2922443Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:23:43.2996363Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:23:43.3009743Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:23:43.3135769Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:23:43.3136288Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:23:43.3137087Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:23:43.3137766Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:23:44.6734121Z ok (4.057s) 2022-05-18T05:23:44.6734519Z 2022-05-18T05:23:44.6735266Z ---------------------------------------------------------------------- 2022-05-18T05:23:44.6735706Z Ran 1 test in 4.058s 2022-05-18T05:23:44.6735857Z 2022-05-18T05:23:44.6735956Z OK 2022-05-18T05:23:44.6736095Z 2022-05-18T05:23:44.6736565Z Generating XML reports... 2022-05-18T05:23:44.6780120Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_spawn_gloo/TEST-TestDistributedNNFunctionsGloo-20220518052340.xml 2022-05-18T05:23:45.8378447Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp5ny6m4wo 2022-05-18T05:23:45.8379484Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp5ny6m4wo/_remote_module_non_scriptable.py 2022-05-18T05:23:47.4482049Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:23:47.4516362Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_spawn_gloo 2022-05-18T05:23:47.4531368Z 2022-05-18T05:23:47.4531595Z Running tests... 2022-05-18T05:23:47.4532053Z ---------------------------------------------------------------------- 2022-05-18T05:23:47.4890745Z test_allreduce (__main__.TestDistributedNNFunctionsGloo) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 106286 2022-05-18T05:23:47.4991815Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 106287 2022-05-18T05:23:48.4354741Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp90d9tder 2022-05-18T05:23:48.4355689Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp90d9tder/_remote_module_non_scriptable.py 2022-05-18T05:23:48.4536502Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpd6oy2qol 2022-05-18T05:23:48.4539253Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpd6oy2qol/_remote_module_non_scriptable.py 2022-05-18T05:23:50.1311726Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:23:50.1323950Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:23:50.1435625Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:23:50.1449517Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:23:50.1561561Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:23:50.1562246Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:23:50.1563050Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:23:50.1563756Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:23:51.6098999Z ok (4.156s) 2022-05-18T05:23:51.6099247Z 2022-05-18T05:23:51.6099658Z ---------------------------------------------------------------------- 2022-05-18T05:23:51.6099999Z Ran 1 test in 4.157s 2022-05-18T05:23:51.6100194Z 2022-05-18T05:23:51.6100301Z OK 2022-05-18T05:23:51.6100443Z 2022-05-18T05:23:51.6100582Z Generating XML reports... 2022-05-18T05:23:51.6145691Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_spawn_gloo/TEST-TestDistributedNNFunctionsGloo-20220518052347.xml 2022-05-18T05:23:52.7840040Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpaqagbm44 2022-05-18T05:23:52.7841819Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpaqagbm44/_remote_module_non_scriptable.py 2022-05-18T05:23:54.4118114Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:23:54.4152403Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_spawn_gloo 2022-05-18T05:23:54.4167263Z 2022-05-18T05:23:54.4167643Z Running tests... 2022-05-18T05:23:54.4168132Z ---------------------------------------------------------------------- 2022-05-18T05:23:54.4524919Z test_broadcast (__main__.TestDistributedNNFunctionsGloo) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 106408 2022-05-18T05:23:54.4626984Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 106409 2022-05-18T05:23:55.3328958Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpxr6w5i5e 2022-05-18T05:23:55.3329794Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpxr6w5i5e/_remote_module_non_scriptable.py 2022-05-18T05:23:55.3426241Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpjijmdi5w 2022-05-18T05:23:55.3429504Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpjijmdi5w/_remote_module_non_scriptable.py 2022-05-18T05:23:57.0451201Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:23:57.0463559Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:23:57.0495594Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:23:57.0509875Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:23:57.0723799Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:23:57.0724700Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:23:57.0725532Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:23:57.0726246Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:23:58.4734410Z ok (4.056s) 2022-05-18T05:23:58.4734732Z 2022-05-18T05:23:58.4735117Z ---------------------------------------------------------------------- 2022-05-18T05:23:58.4735481Z Ran 1 test in 4.057s 2022-05-18T05:23:58.4735647Z 2022-05-18T05:23:58.4735744Z OK 2022-05-18T05:23:58.4735881Z 2022-05-18T05:23:58.4736022Z Generating XML reports... 2022-05-18T05:23:58.4779237Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_spawn_gloo/TEST-TestDistributedNNFunctionsGloo-20220518052354.xml 2022-05-18T05:23:59.6572179Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpbxvq0su6 2022-05-18T05:23:59.6573869Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpbxvq0su6/_remote_module_non_scriptable.py 2022-05-18T05:24:01.3070333Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:24:01.3105105Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_spawn_gloo 2022-05-18T05:24:01.3120012Z 2022-05-18T05:24:01.3120248Z Running tests... 2022-05-18T05:24:01.3120693Z ---------------------------------------------------------------------- 2022-05-18T05:24:01.3487437Z test_gather (__main__.TestDistributedNNFunctionsGloo) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 106530 2022-05-18T05:24:01.3589449Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 106531 2022-05-18T05:24:02.2751722Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpih9bfkx1 2022-05-18T05:24:02.2753578Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpih9bfkx1/_remote_module_non_scriptable.py 2022-05-18T05:24:02.2833553Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp6pv80k3m 2022-05-18T05:24:02.2836175Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp6pv80k3m/_remote_module_non_scriptable.py 2022-05-18T05:24:03.9760580Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:24:03.9772980Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:24:03.9851643Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:24:03.9866127Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:24:04.0079026Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:24:04.0079613Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:24:04.0080397Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:24:04.0081102Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:24:05.4698097Z ok (4.157s) 2022-05-18T05:24:05.4698486Z 2022-05-18T05:24:05.4698963Z ---------------------------------------------------------------------- 2022-05-18T05:24:05.4699326Z Ran 1 test in 4.158s 2022-05-18T05:24:05.4699498Z 2022-05-18T05:24:05.4699577Z OK 2022-05-18T05:24:05.4699722Z 2022-05-18T05:24:05.4699863Z Generating XML reports... 2022-05-18T05:24:05.4745289Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_spawn_gloo/TEST-TestDistributedNNFunctionsGloo-20220518052401.xml 2022-05-18T05:24:06.6521587Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpft_45_yl 2022-05-18T05:24:06.6522658Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpft_45_yl/_remote_module_non_scriptable.py 2022-05-18T05:24:08.2981904Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:24:08.3016674Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_spawn_gloo 2022-05-18T05:24:08.3031908Z 2022-05-18T05:24:08.3032299Z Running tests... 2022-05-18T05:24:08.3032817Z ---------------------------------------------------------------------- 2022-05-18T05:24:08.3388739Z test_reduce (__main__.TestDistributedNNFunctionsGloo) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 106652 2022-05-18T05:24:08.3489852Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 106653 2022-05-18T05:24:09.2377835Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpcjws1rwc 2022-05-18T05:24:09.2380023Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpcjws1rwc/_remote_module_non_scriptable.py 2022-05-18T05:24:09.3007308Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmptaxgvp5g 2022-05-18T05:24:09.3010109Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmptaxgvp5g/_remote_module_non_scriptable.py 2022-05-18T05:24:10.9523181Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:24:10.9536623Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:24:10.9833993Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:24:10.9848247Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:24:11.0052300Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:24:11.0052847Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:24:11.0053651Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:24:11.0054358Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:24:12.4600879Z ok (4.156s) 2022-05-18T05:24:12.4601181Z 2022-05-18T05:24:12.4601880Z ---------------------------------------------------------------------- 2022-05-18T05:24:12.4602587Z Ran 1 test in 4.157s 2022-05-18T05:24:12.4602819Z 2022-05-18T05:24:12.4602944Z OK 2022-05-18T05:24:12.4603086Z 2022-05-18T05:24:12.4603779Z Generating XML reports... 2022-05-18T05:24:12.4647349Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_spawn_gloo/TEST-TestDistributedNNFunctionsGloo-20220518052408.xml 2022-05-18T05:24:13.6449928Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp2ycsk8o3 2022-05-18T05:24:13.6451360Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp2ycsk8o3/_remote_module_non_scriptable.py 2022-05-18T05:24:15.2905037Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:24:15.2941276Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_spawn_gloo 2022-05-18T05:24:15.2956824Z 2022-05-18T05:24:15.2957107Z Running tests... 2022-05-18T05:24:15.2957548Z ---------------------------------------------------------------------- 2022-05-18T05:24:15.3338778Z test_scatter (__main__.TestDistributedNNFunctionsGloo) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 106774 2022-05-18T05:24:15.3440849Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 106775 2022-05-18T05:24:16.2537496Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpolw09i7k 2022-05-18T05:24:16.2538740Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpolw09i7k/_remote_module_non_scriptable.py 2022-05-18T05:24:16.2988768Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp00vhm550 2022-05-18T05:24:16.2991544Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp00vhm550/_remote_module_non_scriptable.py 2022-05-18T05:24:17.9212721Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:24:17.9224872Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:24:17.9911231Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:24:17.9925095Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:24:18.0043230Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:24:18.0043755Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:24:18.0045560Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:24:18.0046277Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:24:19.3548327Z ok (4.059s) 2022-05-18T05:24:19.3548590Z 2022-05-18T05:24:19.3548985Z ---------------------------------------------------------------------- 2022-05-18T05:24:19.3549337Z Ran 1 test in 4.059s 2022-05-18T05:24:19.3549506Z 2022-05-18T05:24:19.3549605Z OK 2022-05-18T05:24:19.3549724Z 2022-05-18T05:24:19.3549864Z Generating XML reports... 2022-05-18T05:24:19.3595348Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_spawn_gloo/TEST-TestDistributedNNFunctionsGloo-20220518052415.xml 2022-05-18T05:24:19.8865363Z Running distributed/test_c10d_spawn_nccl ... [2022-05-18 05:24:19.885983] 2022-05-18T05:24:19.8866158Z Executing ['/opt/conda/bin/python', 'distributed/test_c10d_spawn_nccl.py', '-v', '--subprocess', '--import-slow-tests', '--import-disabled-tests'] ... [2022-05-18 05:24:19.886087] 2022-05-18T05:24:20.7776337Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpwm6wbzyw 2022-05-18T05:24:20.7776970Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpwm6wbzyw/_remote_module_non_scriptable.py 2022-05-18T05:24:22.3881566Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:24:22.3895157Z , <__main__.ProcessGroupShareTensorTest testMethod=test_shared_allreduce_nccl>, <__main__.ProcessGroupShareTensorTest testMethod=test_shared_broadcast_nccl>, <__main__.ProcessGroupShareTensorTest testMethod=test_shared_reduce_nccl>]> 2022-05-18T05:24:22.3896868Z test_shared_allgather_nccl (__main__.ProcessGroupShareTensorTest) 2022-05-18T05:24:22.3897632Z test_shared_allreduce_nccl (__main__.ProcessGroupShareTensorTest) 2022-05-18T05:24:22.3898146Z test_shared_broadcast_nccl (__main__.ProcessGroupShareTensorTest) 2022-05-18T05:24:22.3898575Z test_shared_reduce_nccl (__main__.ProcessGroupShareTensorTest) 2022-05-18T05:24:22.3898953Z 2022-05-18T05:24:22.3899326Z 2022-05-18T05:24:22.3900584Z , <__main__.TestDistributedNNFunctionsNccl testMethod=test_all_to_all>, <__main__.TestDistributedNNFunctionsNccl testMethod=test_all_to_all_single>, <__main__.TestDistributedNNFunctionsNccl testMethod=test_allreduce>, <__main__.TestDistributedNNFunctionsNccl testMethod=test_broadcast>, <__main__.TestDistributedNNFunctionsNccl testMethod=test_reduce>, <__main__.TestDistributedNNFunctionsNccl testMethod=test_reduce_scatter>]> 2022-05-18T05:24:22.3901664Z test_all_gather (__main__.TestDistributedNNFunctionsNccl) 2022-05-18T05:24:22.3902085Z test_all_to_all (__main__.TestDistributedNNFunctionsNccl) 2022-05-18T05:24:22.3902499Z test_all_to_all_single (__main__.TestDistributedNNFunctionsNccl) 2022-05-18T05:24:22.3902894Z test_allreduce (__main__.TestDistributedNNFunctionsNccl) 2022-05-18T05:24:22.3903295Z test_broadcast (__main__.TestDistributedNNFunctionsNccl) 2022-05-18T05:24:22.3903690Z test_reduce (__main__.TestDistributedNNFunctionsNccl) 2022-05-18T05:24:22.3904105Z test_reduce_scatter (__main__.TestDistributedNNFunctionsNccl) 2022-05-18T05:24:23.2847063Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp1pgnj6fz 2022-05-18T05:24:23.2847912Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp1pgnj6fz/_remote_module_non_scriptable.py 2022-05-18T05:24:24.8925245Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:24:24.8958074Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_spawn_nccl 2022-05-18T05:24:24.8973318Z 2022-05-18T05:24:24.8973438Z Running tests... 2022-05-18T05:24:24.8973889Z ---------------------------------------------------------------------- 2022-05-18T05:24:26.7966989Z test_shared_allgather_nccl (__main__.ProcessGroupShareTensorTest) ... INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpktl6igli 2022-05-18T05:24:26.7968342Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpktl6igli/_remote_module_non_scriptable.py 2022-05-18T05:24:26.8065176Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpf5ohmo2d 2022-05-18T05:24:26.8068005Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpf5ohmo2d/_remote_module_non_scriptable.py 2022-05-18T05:24:28.4791354Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:24:28.4994943Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:24:30.0254142Z ok (5.128s) 2022-05-18T05:24:30.0255186Z 2022-05-18T05:24:30.0255599Z ---------------------------------------------------------------------- 2022-05-18T05:24:30.0255980Z Ran 1 test in 5.128s 2022-05-18T05:24:30.0256140Z 2022-05-18T05:24:30.0256235Z OK 2022-05-18T05:24:30.0256378Z 2022-05-18T05:24:30.0256519Z Generating XML reports... 2022-05-18T05:24:30.0299229Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_spawn_nccl/TEST-ProcessGroupShareTensorTest-20220518052424.xml 2022-05-18T05:24:31.2534359Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp2pgi9dq1 2022-05-18T05:24:31.2535787Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp2pgi9dq1/_remote_module_non_scriptable.py 2022-05-18T05:24:32.9088418Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:24:32.9122432Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_spawn_nccl 2022-05-18T05:24:32.9138086Z 2022-05-18T05:24:32.9138243Z Running tests... 2022-05-18T05:24:32.9139063Z ---------------------------------------------------------------------- 2022-05-18T05:24:34.8060265Z test_shared_allreduce_nccl (__main__.ProcessGroupShareTensorTest) ... INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp6pylsft0 2022-05-18T05:24:34.8061317Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp6pylsft0/_remote_module_non_scriptable.py 2022-05-18T05:24:34.8576240Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp79lx57v_ 2022-05-18T05:24:34.8579144Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp79lx57v_/_remote_module_non_scriptable.py 2022-05-18T05:24:36.4807672Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:24:36.5379390Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:24:38.0405051Z ok (5.126s) 2022-05-18T05:24:38.0406205Z 2022-05-18T05:24:38.0406771Z ---------------------------------------------------------------------- 2022-05-18T05:24:38.0407124Z Ran 1 test in 5.127s 2022-05-18T05:24:38.0407294Z 2022-05-18T05:24:38.0407397Z OK 2022-05-18T05:24:38.0407533Z 2022-05-18T05:24:38.0407679Z Generating XML reports... 2022-05-18T05:24:38.0449719Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_spawn_nccl/TEST-ProcessGroupShareTensorTest-20220518052432.xml 2022-05-18T05:24:39.2723717Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpom8yi3a0 2022-05-18T05:24:39.2724956Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpom8yi3a0/_remote_module_non_scriptable.py 2022-05-18T05:24:40.9048146Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:24:40.9081336Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_spawn_nccl 2022-05-18T05:24:40.9097113Z 2022-05-18T05:24:40.9097256Z Running tests... 2022-05-18T05:24:40.9097977Z ---------------------------------------------------------------------- 2022-05-18T05:24:42.8206905Z test_shared_broadcast_nccl (__main__.ProcessGroupShareTensorTest) ... INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpt92hkwrn 2022-05-18T05:24:42.8207868Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpt92hkwrn/_remote_module_non_scriptable.py 2022-05-18T05:24:42.8627868Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpxsszul2b 2022-05-18T05:24:42.8630472Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpxsszul2b/_remote_module_non_scriptable.py 2022-05-18T05:24:44.5240328Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:24:44.5258709Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:24:46.0582733Z ok (5.148s) 2022-05-18T05:24:46.0585211Z 2022-05-18T05:24:46.0586195Z ---------------------------------------------------------------------- 2022-05-18T05:24:46.0586656Z Ran 1 test in 5.149s 2022-05-18T05:24:46.0586829Z 2022-05-18T05:24:46.0586939Z OK 2022-05-18T05:24:46.0587213Z 2022-05-18T05:24:46.0587405Z Generating XML reports... 2022-05-18T05:24:46.0629467Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_spawn_nccl/TEST-ProcessGroupShareTensorTest-20220518052440.xml 2022-05-18T05:24:47.2827619Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpdiuc_hv7 2022-05-18T05:24:47.2828796Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpdiuc_hv7/_remote_module_non_scriptable.py 2022-05-18T05:24:48.9098597Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:24:48.9131386Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_spawn_nccl 2022-05-18T05:24:48.9146490Z 2022-05-18T05:24:48.9146629Z Running tests... 2022-05-18T05:24:48.9147332Z ---------------------------------------------------------------------- 2022-05-18T05:24:50.8200238Z test_shared_reduce_nccl (__main__.ProcessGroupShareTensorTest) ... INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp194h4b1x 2022-05-18T05:24:50.8200945Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp194h4b1x/_remote_module_non_scriptable.py 2022-05-18T05:24:50.8516648Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpv__4q86z 2022-05-18T05:24:50.8519384Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpv__4q86z/_remote_module_non_scriptable.py 2022-05-18T05:24:52.5239530Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:24:52.5245371Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:24:54.0593410Z ok (5.144s) 2022-05-18T05:24:54.0594415Z 2022-05-18T05:24:54.0594832Z ---------------------------------------------------------------------- 2022-05-18T05:24:54.0595203Z Ran 1 test in 5.145s 2022-05-18T05:24:54.0595382Z 2022-05-18T05:24:54.0595479Z OK 2022-05-18T05:24:54.0595617Z 2022-05-18T05:24:54.0595754Z Generating XML reports... 2022-05-18T05:24:54.0637649Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_spawn_nccl/TEST-ProcessGroupShareTensorTest-20220518052448.xml 2022-05-18T05:24:55.2719734Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmphm7i0c2t 2022-05-18T05:24:55.2720677Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmphm7i0c2t/_remote_module_non_scriptable.py 2022-05-18T05:24:56.8780057Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:24:56.8812544Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_spawn_nccl 2022-05-18T05:24:56.8827736Z 2022-05-18T05:24:56.8827876Z Running tests... 2022-05-18T05:24:56.8828564Z ---------------------------------------------------------------------- 2022-05-18T05:24:56.9184903Z test_all_gather (__main__.TestDistributedNNFunctionsNccl) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 107436 2022-05-18T05:24:56.9285090Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 107437 2022-05-18T05:24:57.8188484Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpvyrajmpv 2022-05-18T05:24:57.8189771Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpvyrajmpv/_remote_module_non_scriptable.py 2022-05-18T05:24:57.8556635Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp2d89ahn7 2022-05-18T05:24:57.8559218Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp2d89ahn7/_remote_module_non_scriptable.py 2022-05-18T05:24:59.5290075Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:24:59.5302722Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:24:59.5306449Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:24:59.5392648Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:24:59.5406511Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:24:59.5410307Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:24:59.5411918Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:24:59.5511862Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:25:00.9392999Z ok (4.056s) 2022-05-18T05:25:00.9393243Z 2022-05-18T05:25:00.9393634Z ---------------------------------------------------------------------- 2022-05-18T05:25:00.9393961Z Ran 1 test in 4.056s 2022-05-18T05:25:00.9394131Z 2022-05-18T05:25:00.9394229Z OK 2022-05-18T05:25:00.9394370Z 2022-05-18T05:25:00.9394506Z Generating XML reports... 2022-05-18T05:25:00.9437778Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_spawn_nccl/TEST-TestDistributedNNFunctionsNccl-20220518052456.xml 2022-05-18T05:25:02.1128374Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpfag2p_wq 2022-05-18T05:25:02.1129250Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpfag2p_wq/_remote_module_non_scriptable.py 2022-05-18T05:25:03.7319842Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:25:03.7353486Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_spawn_nccl 2022-05-18T05:25:03.7369109Z 2022-05-18T05:25:03.7369254Z Running tests... 2022-05-18T05:25:03.7369913Z ---------------------------------------------------------------------- 2022-05-18T05:25:03.7729800Z test_all_to_all (__main__.TestDistributedNNFunctionsNccl) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 107561 2022-05-18T05:25:03.7833168Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 107562 2022-05-18T05:25:04.6675780Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpkpdahukj 2022-05-18T05:25:04.6676563Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpkpdahukj/_remote_module_non_scriptable.py 2022-05-18T05:25:04.6715578Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp54hg74i9 2022-05-18T05:25:04.6718542Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp54hg74i9/_remote_module_non_scriptable.py 2022-05-18T05:25:06.3636950Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:25:06.3649493Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:25:06.3653884Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:25:06.3924833Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:25:06.3938713Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:25:06.3943075Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:25:06.3944093Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:25:06.3960279Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:25:07.7939539Z ok (4.057s) 2022-05-18T05:25:07.7939783Z 2022-05-18T05:25:07.7940203Z ---------------------------------------------------------------------- 2022-05-18T05:25:07.7940539Z Ran 1 test in 4.057s 2022-05-18T05:25:07.7940715Z 2022-05-18T05:25:07.7940815Z OK 2022-05-18T05:25:07.7940960Z 2022-05-18T05:25:07.7941193Z Generating XML reports... 2022-05-18T05:25:07.7985775Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_spawn_nccl/TEST-TestDistributedNNFunctionsNccl-20220518052503.xml 2022-05-18T05:25:08.9797333Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpksfv0bp0 2022-05-18T05:25:08.9798715Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpksfv0bp0/_remote_module_non_scriptable.py 2022-05-18T05:25:10.6318654Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:25:10.6353022Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_spawn_nccl 2022-05-18T05:25:10.6368706Z 2022-05-18T05:25:10.6369369Z Running tests... 2022-05-18T05:25:10.6369892Z ---------------------------------------------------------------------- 2022-05-18T05:25:10.6735417Z test_all_to_all_single (__main__.TestDistributedNNFunctionsNccl) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 107686 2022-05-18T05:25:10.6838002Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 107687 2022-05-18T05:25:11.5592452Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpi60sowsl 2022-05-18T05:25:11.5593673Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpi60sowsl/_remote_module_non_scriptable.py 2022-05-18T05:25:11.5940149Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpird6thtc 2022-05-18T05:25:11.5942780Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpird6thtc/_remote_module_non_scriptable.py 2022-05-18T05:25:13.2435983Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:25:13.2448414Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:25:13.2453068Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:25:13.2514243Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:25:13.2527515Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:25:13.2531336Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:25:13.2532539Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:25:13.2556090Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:25:14.6962339Z ok (4.059s) 2022-05-18T05:25:14.6962602Z 2022-05-18T05:25:14.6963009Z ---------------------------------------------------------------------- 2022-05-18T05:25:14.6963362Z Ran 1 test in 4.059s 2022-05-18T05:25:14.6963534Z 2022-05-18T05:25:14.6963630Z OK 2022-05-18T05:25:14.6964895Z 2022-05-18T05:25:14.6965166Z Generating XML reports... 2022-05-18T05:25:14.7007319Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_spawn_nccl/TEST-TestDistributedNNFunctionsNccl-20220518052510.xml 2022-05-18T05:25:15.8869938Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpx2rid7qm 2022-05-18T05:25:15.8871322Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpx2rid7qm/_remote_module_non_scriptable.py 2022-05-18T05:25:17.5492314Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:25:17.5525412Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_spawn_nccl 2022-05-18T05:25:17.5540719Z 2022-05-18T05:25:17.5541076Z Running tests... 2022-05-18T05:25:17.5541604Z ---------------------------------------------------------------------- 2022-05-18T05:25:17.5901368Z test_allreduce (__main__.TestDistributedNNFunctionsNccl) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 107811 2022-05-18T05:25:17.6003146Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 107812 2022-05-18T05:25:18.4697686Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpipt8wsiq 2022-05-18T05:25:18.4699043Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpipt8wsiq/_remote_module_non_scriptable.py 2022-05-18T05:25:18.4713436Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpeia3mrsw 2022-05-18T05:25:18.4716130Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpeia3mrsw/_remote_module_non_scriptable.py 2022-05-18T05:25:20.1870208Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:25:20.1881949Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:25:20.1885969Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:25:20.1908225Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:25:20.1921486Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:25:20.1925318Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:25:20.1926583Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:25:20.1989132Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:25:21.7111803Z ok (4.157s) 2022-05-18T05:25:21.7112041Z 2022-05-18T05:25:21.7112437Z ---------------------------------------------------------------------- 2022-05-18T05:25:21.7112770Z Ran 1 test in 4.157s 2022-05-18T05:25:21.7112939Z 2022-05-18T05:25:21.7113037Z OK 2022-05-18T05:25:21.7113176Z 2022-05-18T05:25:21.7113309Z Generating XML reports... 2022-05-18T05:25:21.7157771Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_spawn_nccl/TEST-TestDistributedNNFunctionsNccl-20220518052517.xml 2022-05-18T05:25:22.9064218Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpbnpa78rq 2022-05-18T05:25:22.9068991Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpbnpa78rq/_remote_module_non_scriptable.py 2022-05-18T05:25:24.5688294Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:25:24.5721565Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_spawn_nccl 2022-05-18T05:25:24.5736517Z 2022-05-18T05:25:24.5736916Z Running tests... 2022-05-18T05:25:24.5737439Z ---------------------------------------------------------------------- 2022-05-18T05:25:24.6097967Z test_broadcast (__main__.TestDistributedNNFunctionsNccl) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 107936 2022-05-18T05:25:24.6198489Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 107937 2022-05-18T05:25:25.5043920Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp68dxq0p5 2022-05-18T05:25:25.5045289Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp68dxq0p5/_remote_module_non_scriptable.py 2022-05-18T05:25:25.5254600Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpcab54bwq 2022-05-18T05:25:25.5257327Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpcab54bwq/_remote_module_non_scriptable.py 2022-05-18T05:25:27.2086813Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:25:27.2098816Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:25:27.2102522Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:25:27.2250776Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:25:27.2265090Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:25:27.2269143Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:25:27.2270343Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:25:27.2307354Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:25:28.7306268Z ok (4.157s) 2022-05-18T05:25:28.7306492Z 2022-05-18T05:25:28.7306906Z ---------------------------------------------------------------------- 2022-05-18T05:25:28.7307602Z Ran 1 test in 4.157s 2022-05-18T05:25:28.7307775Z 2022-05-18T05:25:28.7307877Z OK 2022-05-18T05:25:28.7308017Z 2022-05-18T05:25:28.7308158Z Generating XML reports... 2022-05-18T05:25:28.7351960Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_spawn_nccl/TEST-TestDistributedNNFunctionsNccl-20220518052524.xml 2022-05-18T05:25:29.8940422Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmprblp08ju 2022-05-18T05:25:29.8941719Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmprblp08ju/_remote_module_non_scriptable.py 2022-05-18T05:25:31.4987401Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:25:31.5019846Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_spawn_nccl 2022-05-18T05:25:31.5034678Z 2022-05-18T05:25:31.5034907Z Running tests... 2022-05-18T05:25:31.5035350Z ---------------------------------------------------------------------- 2022-05-18T05:25:31.5391350Z test_reduce (__main__.TestDistributedNNFunctionsNccl) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 108061 2022-05-18T05:25:31.5493483Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 108062 2022-05-18T05:25:32.4587036Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpvi6h1ek3 2022-05-18T05:25:32.4588210Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpvi6h1ek3/_remote_module_non_scriptable.py 2022-05-18T05:25:32.4750533Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpdd4zr3qy 2022-05-18T05:25:32.4753491Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpdd4zr3qy/_remote_module_non_scriptable.py 2022-05-18T05:25:34.1415585Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:25:34.1427352Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:25:34.1431137Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:25:34.1706708Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:25:34.1721091Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:25:34.1726012Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:25:34.1727138Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:25:34.1737644Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:25:35.6602409Z ok (4.156s) 2022-05-18T05:25:35.6602648Z 2022-05-18T05:25:35.6603052Z ---------------------------------------------------------------------- 2022-05-18T05:25:35.6603386Z Ran 1 test in 4.157s 2022-05-18T05:25:35.6603554Z 2022-05-18T05:25:35.6603652Z OK 2022-05-18T05:25:35.6603798Z 2022-05-18T05:25:35.6603963Z Generating XML reports... 2022-05-18T05:25:35.6648191Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_spawn_nccl/TEST-TestDistributedNNFunctionsNccl-20220518052531.xml 2022-05-18T05:25:36.8367543Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp4muq0vlm 2022-05-18T05:25:36.8368514Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp4muq0vlm/_remote_module_non_scriptable.py 2022-05-18T05:25:38.4837546Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:25:38.4870913Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_spawn_nccl 2022-05-18T05:25:38.4886285Z 2022-05-18T05:25:38.4886660Z Running tests... 2022-05-18T05:25:38.4887172Z ---------------------------------------------------------------------- 2022-05-18T05:25:38.5257794Z test_reduce_scatter (__main__.TestDistributedNNFunctionsNccl) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 108186 2022-05-18T05:25:38.5358717Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 108187 2022-05-18T05:25:39.4194711Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmplvkocd45 2022-05-18T05:25:39.4196021Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmplvkocd45/_remote_module_non_scriptable.py 2022-05-18T05:25:39.4555049Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpcfrsjhv_ 2022-05-18T05:25:39.4557544Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpcfrsjhv_/_remote_module_non_scriptable.py 2022-05-18T05:25:41.1263332Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:25:41.1275066Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:25:41.1278922Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:25:41.1402872Z INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:25:41.1417357Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:25:41.1421529Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:25:41.1422674Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:25:41.1484094Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:25:42.6482134Z ok (4.159s) 2022-05-18T05:25:42.6482352Z 2022-05-18T05:25:42.6482758Z ---------------------------------------------------------------------- 2022-05-18T05:25:42.6483082Z Ran 1 test in 4.159s 2022-05-18T05:25:42.6483252Z 2022-05-18T05:25:42.6483354Z OK 2022-05-18T05:25:42.6483495Z 2022-05-18T05:25:42.6483635Z Generating XML reports... 2022-05-18T05:25:42.6527221Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_spawn_nccl/TEST-TestDistributedNNFunctionsNccl-20220518052538.xml 2022-05-18T05:25:43.1714051Z Running distributed/fsdp/test_wrap ... [2022-05-18 05:25:43.170876] 2022-05-18T05:25:43.1715270Z Executing ['/opt/conda/bin/python', 'distributed/fsdp/test_wrap.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2022-05-18 05:25:43.170981] 2022-05-18T05:25:44.0960870Z Test results will be stored in test-reports/python-unittest/distributed.fsdp.test_wrap 2022-05-18T05:25:44.0982746Z 2022-05-18T05:25:44.0983055Z Running tests... 2022-05-18T05:25:44.0983513Z ---------------------------------------------------------------------- 2022-05-18T05:25:44.0991080Z test_always_wrap (__main__.TestAutoWrap) 2022-05-18T05:25:45.7046972Z Test to ensure that if `always_wrap_policy` is ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:25:45.7225468Z ok (1.624s) 2022-05-18T05:25:45.7249674Z test_always_wrap_with_ignored_modules_wrap_method_WrapMethod_FSDP_CTOR (__main__.TestAutoWrap) ... [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T05:25:45.7251456Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T05:25:45.7252740Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T05:25:45.7254017Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T05:25:45.7264762Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:25:45.7265407Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:25:45.7302218Z ok (0.008s) 2022-05-18T05:25:45.7366611Z test_always_wrap_with_ignored_modules_wrap_method_WrapMethod_WRAP_API (__main__.TestAutoWrap) ... ok (0.006s) 2022-05-18T05:25:45.7373852Z test_auto_wrap_api (__main__.TestAutoWrap) 2022-05-18T05:25:45.7385147Z Test to ensure with auto wrap, we wrap child modules correctly based on the min_num_params. ... [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T05:25:45.7386560Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T05:25:45.7387817Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T05:25:45.7389093Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T05:25:45.7390353Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T05:25:45.7391611Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T05:25:45.7410435Z ok (0.004s) 2022-05-18T05:25:45.7420166Z test_auto_wrap_preset_exclude_wrap (__main__.TestAutoWrap) 2022-05-18T05:25:45.7437334Z Test to ensure excluded modules are not wrapped, regardless if the total param size is greater than the ... ok (0.003s) 2022-05-18T05:25:45.7445121Z test_auto_wrap_preset_exclude_wrap_include_children (__main__.TestAutoWrap) 2022-05-18T05:25:45.7462119Z Test to ensure excluded modules are not wrapped, but children are if param size is greater than ... ok (0.002s) 2022-05-18T05:25:45.7471945Z test_auto_wrap_preset_force_leaf (__main__.TestAutoWrap) 2022-05-18T05:25:45.7508107Z Test to ensure force-leaf modules are not wrapped, and children are not wrapped. The ... ok (0.004s) 2022-05-18T05:25:45.7518759Z test_auto_wrap_preset_force_leaf_custom (__main__.TestAutoWrap) 2022-05-18T05:25:45.7537255Z Test to ensure force-leaf modules are not wrapped. ... ok (0.003s) 2022-05-18T05:25:45.7571426Z test_auto_wrap_smoke_test_fsdp_init_mode_FSDPInitMode_CUDA_AFTER_cpu_offload_CPUOffload(offload_params=False)_use_device_id_False (__main__.TestAutoWrap) ... INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:25:45.7572470Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2022-05-18T05:25:45.7578471Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T05:25:45.7579789Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T05:25:45.7581064Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T05:25:45.7582334Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T05:25:45.7583604Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T05:25:46.0408143Z ok (0.287s) 2022-05-18T05:25:46.0440201Z test_auto_wrap_smoke_test_fsdp_init_mode_FSDPInitMode_CUDA_AFTER_cpu_offload_CPUOffload(offload_params=False)_use_device_id_True (__main__.TestAutoWrap) ... INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:25:46.0441404Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2022-05-18T05:25:46.0500349Z ok (0.009s) 2022-05-18T05:25:46.0525250Z test_auto_wrap_smoke_test_fsdp_init_mode_FSDPInitMode_CUDA_AFTER_cpu_offload_CPUOffload(offload_params=True)_use_device_id_False (__main__.TestAutoWrap) ... ok (0.002s) 2022-05-18T05:25:46.0553097Z test_auto_wrap_smoke_test_fsdp_init_mode_FSDPInitMode_CUDA_AFTER_cpu_offload_CPUOffload(offload_params=True)_use_device_id_True (__main__.TestAutoWrap) ... ok (0.003s) 2022-05-18T05:25:46.0585985Z test_auto_wrap_smoke_test_fsdp_init_mode_FSDPInitMode_CUDA_BEFORE_cpu_offload_CPUOffload(offload_params=False)_use_device_id_False (__main__.TestAutoWrap) ... INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:25:46.0586886Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2022-05-18T05:25:46.0607522Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T05:25:46.0608826Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T05:25:46.0610775Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T05:25:46.0612087Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T05:25:46.0648850Z ok (0.009s) 2022-05-18T05:25:46.0678628Z test_auto_wrap_smoke_test_fsdp_init_mode_FSDPInitMode_CUDA_BEFORE_cpu_offload_CPUOffload(offload_params=False)_use_device_id_True (__main__.TestAutoWrap) ... INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:25:46.0679723Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2022-05-18T05:25:46.0735595Z ok (0.009s) 2022-05-18T05:25:46.0768397Z test_auto_wrap_smoke_test_fsdp_init_mode_FSDPInitMode_CUDA_BEFORE_cpu_offload_CPUOffload(offload_params=True)_use_device_id_False (__main__.TestAutoWrap) ... INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:25:46.0769264Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2022-05-18T05:25:46.0862488Z ok (0.013s) 2022-05-18T05:25:46.0893146Z test_auto_wrap_smoke_test_fsdp_init_mode_FSDPInitMode_CUDA_BEFORE_cpu_offload_CPUOffload(offload_params=True)_use_device_id_True (__main__.TestAutoWrap) ... INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:25:46.0894047Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. 2022-05-18T05:25:46.0912276Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T05:25:46.0913790Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T05:25:46.0915269Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T05:25:46.0916566Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T05:25:46.0978157Z ok (0.011s) 2022-05-18T05:25:46.1020450Z test_auto_wrap_with_ignored_modules_wrap_method_WrapMethod_FSDP_CTOR (__main__.TestAutoWrap) ... ok (0.004s) 2022-05-18T05:25:46.1062790Z test_auto_wrap_with_ignored_modules_wrap_method_WrapMethod_WRAP_API (__main__.TestAutoWrap) ... ok (0.004s) 2022-05-18T05:25:46.1099005Z test_transformer_auto_wrap_policy (__main__.TestAutoWrap) ... [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T05:25:46.1100848Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T05:25:46.1303711Z ok (0.024s) 2022-05-18T05:25:46.1326600Z test_wrap_disabled_outside_context (__main__.TestAutoWrap) ... [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T05:25:46.1329576Z ok (0.002s) 2022-05-18T05:25:46.1355376Z test_wrap_override_defaults (__main__.TestAutoWrap) ... ok (0.002s) 2022-05-18T05:25:46.1380987Z test_wrap_wrap_method_WrapMethod_FSDP_CTOR (__main__.TestAutoWrap) ... ok (0.002s) 2022-05-18T05:25:46.1406193Z test_wrap_wrap_method_WrapMethod_WRAP_API (__main__.TestAutoWrap) ... ok (0.002s) 2022-05-18T05:25:46.1422494Z test_bn_always_wrapped_individually (__main__.TestFSDPWrap) 2022-05-18T05:25:46.1705667Z Ensures that by using _or_policy with _wrap_batchnorm_individually, even ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 108319 2022-05-18T05:25:46.1836089Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 108320 2022-05-18T05:25:47.1213963Z dist init r=1, world=2 2022-05-18T05:25:47.1217112Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:25:47.1296811Z dist init r=0, world=2 2022-05-18T05:25:47.1301354Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:25:47.1302571Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:25:47.1320219Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:25:48.4955797Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:25:48.4956315Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:25:48.5165426Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:25:48.5166120Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:25:48.5166970Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:25:48.5167613Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:25:48.7907447Z ok (2.650s) 2022-05-18T05:25:48.7917461Z test_error_already_wrapped_nested_False_fsdp_init_mode_FSDPInitMode_CUDA_AFTER (__main__.TestFSDPWrap) 2022-05-18T05:25:48.8057998Z Test that an error is raised if we attempt to wrap when submodules are ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 108402 2022-05-18T05:25:48.8183657Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 108403 2022-05-18T05:25:49.7397446Z dist init r=1, world=2 2022-05-18T05:25:49.7400855Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:25:49.7409057Z dist init r=0, world=2 2022-05-18T05:25:49.7414152Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:25:49.7415562Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:25:49.7504159Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:25:51.1479413Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:25:51.1479955Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:25:51.1671694Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:25:51.1672395Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:25:51.1707545Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:25:51.1708186Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:25:51.4256053Z ok (2.635s) 2022-05-18T05:25:51.4266471Z test_error_already_wrapped_nested_False_fsdp_init_mode_FSDPInitMode_CUDA_BEFORE (__main__.TestFSDPWrap) 2022-05-18T05:25:51.4410298Z Test that an error is raised if we attempt to wrap when submodules are ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 108485 2022-05-18T05:25:51.4540439Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 108486 2022-05-18T05:25:52.3714071Z dist init r=1, world=2 2022-05-18T05:25:52.3717215Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:25:52.3741230Z dist init r=0, world=2 2022-05-18T05:25:52.3746159Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:25:52.3747420Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:25:52.3820480Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:25:53.7476064Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:25:53.7476594Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:25:54.0614679Z ok (2.636s) 2022-05-18T05:25:54.0624783Z test_error_already_wrapped_nested_True_fsdp_init_mode_FSDPInitMode_CUDA_AFTER (__main__.TestFSDPWrap) 2022-05-18T05:25:54.0769800Z Test that an error is raised if we attempt to wrap when submodules are ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 108568 2022-05-18T05:25:54.0898755Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 108569 2022-05-18T05:25:55.0022367Z dist init r=1, world=2 2022-05-18T05:25:55.0025490Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:25:55.0108645Z dist init r=0, world=2 2022-05-18T05:25:55.0113733Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:25:55.0114857Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:25:55.0128556Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:25:56.4143217Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:25:56.4143764Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:25:56.4353392Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:25:56.4354097Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:25:56.4354959Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:25:56.4355598Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:25:56.6969998Z ok (2.635s) 2022-05-18T05:25:56.6980626Z test_error_already_wrapped_nested_True_fsdp_init_mode_FSDPInitMode_CUDA_BEFORE (__main__.TestFSDPWrap) 2022-05-18T05:25:56.7121321Z Test that an error is raised if we attempt to wrap when submodules are ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 108651 2022-05-18T05:25:56.7248572Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 108652 2022-05-18T05:25:57.6409616Z dist init r=0, world=2 2022-05-18T05:25:57.6413689Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:25:57.6708040Z dist init r=1, world=2 2022-05-18T05:25:57.6712911Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:25:57.6713810Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:25:57.6720111Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:25:59.0404268Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:25:59.0404801Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:25:59.3321235Z ok (2.635s) 2022-05-18T05:25:59.3496056Z test_main_wrap_api_cpu_offload_CPUOffload(offload_params=False)_backward_prefetch_BackwardPrefetch_BACKWARD_POST_fsdp_init_mode_FSDPInitMode_CUDA_AFTER (__main__.TestFSDPWrap) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 108734 2022-05-18T05:25:59.3622532Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 108735 2022-05-18T05:26:00.2778650Z dist init r=1, world=2 2022-05-18T05:26:00.2781909Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:26:00.2884173Z dist init r=0, world=2 2022-05-18T05:26:00.2888761Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:26:00.2889563Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:26:00.2987097Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:26:01.6558287Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:26:01.6558818Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:26:01.6757112Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:26:01.6757779Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:26:01.6758873Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:26:01.6759521Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:26:02.2702825Z ok (2.938s) 2022-05-18T05:26:02.2882786Z test_main_wrap_api_cpu_offload_CPUOffload(offload_params=False)_backward_prefetch_BackwardPrefetch_BACKWARD_POST_fsdp_init_mode_FSDPInitMode_CUDA_BEFORE (__main__.TestFSDPWrap) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 108821 2022-05-18T05:26:02.3012244Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 108822 2022-05-18T05:26:03.2263817Z dist init r=1, world=2 2022-05-18T05:26:03.2267738Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:26:03.2587687Z dist init r=0, world=2 2022-05-18T05:26:03.2592247Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:26:03.2593044Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:26:03.2676130Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:26:04.6276712Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:26:04.6277232Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:26:05.3093019Z ok (3.039s) 2022-05-18T05:26:05.3269078Z test_main_wrap_api_cpu_offload_CPUOffload(offload_params=False)_backward_prefetch_BackwardPrefetch_BACKWARD_PRE_fsdp_init_mode_FSDPInitMode_CUDA_AFTER (__main__.TestFSDPWrap) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 108908 2022-05-18T05:26:05.3396991Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 108909 2022-05-18T05:26:06.2538003Z dist init r=1, world=2 2022-05-18T05:26:06.2541952Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:26:06.2941580Z dist init r=0, world=2 2022-05-18T05:26:06.2946763Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:26:06.2947590Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:26:06.2950360Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:26:07.6931563Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:26:07.6932091Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:26:07.7157433Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:26:07.7158123Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:26:07.7193199Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:26:07.7193853Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:26:08.3477244Z ok (3.038s) 2022-05-18T05:26:08.3651988Z test_main_wrap_api_cpu_offload_CPUOffload(offload_params=False)_backward_prefetch_BackwardPrefetch_BACKWARD_PRE_fsdp_init_mode_FSDPInitMode_CUDA_BEFORE (__main__.TestFSDPWrap) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 108995 2022-05-18T05:26:08.3778954Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 108996 2022-05-18T05:26:09.3019446Z dist init r=0, world=2 2022-05-18T05:26:09.3022992Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:26:09.3298192Z dist init r=1, world=2 2022-05-18T05:26:09.3302426Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:26:09.3303577Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:26:09.3329690Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:26:10.7235551Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:26:10.7236116Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:26:11.3862803Z ok (3.038s) 2022-05-18T05:26:11.4041484Z test_main_wrap_api_cpu_offload_CPUOffload(offload_params=True)_backward_prefetch_BackwardPrefetch_BACKWARD_POST_fsdp_init_mode_FSDPInitMode_CUDA_AFTER (__main__.TestFSDPWrap) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 109082 2022-05-18T05:26:11.4170183Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 109083 2022-05-18T05:26:12.3331779Z dist init r=0, world=2 2022-05-18T05:26:12.3334926Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:26:12.3468347Z dist init r=1, world=2 2022-05-18T05:26:12.3473119Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:26:12.3473986Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:26:12.3539950Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:26:13.7305026Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:26:13.7305551Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:26:14.0241577Z ok (2.638s) 2022-05-18T05:26:14.0424034Z test_main_wrap_api_cpu_offload_CPUOffload(offload_params=True)_backward_prefetch_BackwardPrefetch_BACKWARD_POST_fsdp_init_mode_FSDPInitMode_CUDA_BEFORE (__main__.TestFSDPWrap) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 109165 2022-05-18T05:26:14.0552438Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 109166 2022-05-18T05:26:14.9810738Z dist init r=1, world=2 2022-05-18T05:26:14.9814570Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:26:15.0015251Z dist init r=0, world=2 2022-05-18T05:26:15.0020225Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:26:15.0021016Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:26:15.0121588Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:26:16.3729288Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:26:16.3730051Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:26:17.0634213Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T05:26:17.0636001Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T05:26:17.0637334Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T05:26:17.0638368Z ok (3.040s) 2022-05-18T05:26:17.0817071Z test_main_wrap_api_cpu_offload_CPUOffload(offload_params=True)_backward_prefetch_BackwardPrefetch_BACKWARD_PRE_fsdp_init_mode_FSDPInitMode_CUDA_AFTER (__main__.TestFSDPWrap) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 109252 2022-05-18T05:26:17.0947408Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 109253 2022-05-18T05:26:18.0234502Z dist init r=0, world=2 2022-05-18T05:26:18.0237673Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:26:18.0320216Z dist init r=1, world=2 2022-05-18T05:26:18.0325485Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:26:18.0326506Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:26:18.0341148Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:26:19.4217013Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:26:19.4217562Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:26:19.7020277Z ok (2.638s) 2022-05-18T05:26:19.7196289Z test_main_wrap_api_cpu_offload_CPUOffload(offload_params=True)_backward_prefetch_BackwardPrefetch_BACKWARD_PRE_fsdp_init_mode_FSDPInitMode_CUDA_BEFORE (__main__.TestFSDPWrap) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 109335 2022-05-18T05:26:19.7324691Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 109336 2022-05-18T05:26:20.6542551Z dist init r=1, world=2 2022-05-18T05:26:20.6545675Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:26:20.6565794Z dist init r=0, world=2 2022-05-18T05:26:20.6570325Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:26:20.6571499Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:26:20.6649002Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:26:22.0503285Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:26:22.0503843Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:26:22.7405726Z ok (3.038s) 2022-05-18T05:26:22.7558038Z test_wrap_batchnorm_individually_use_or_policy_False (__main__.TestFSDPWrap) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 109422 2022-05-18T05:26:22.7684624Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 109423 2022-05-18T05:26:23.6863827Z dist init r=0, world=2 2022-05-18T05:26:23.6867007Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:26:23.6900409Z dist init r=1, world=2 2022-05-18T05:26:23.6905154Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:26:23.6906742Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:26:23.6970561Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:26:25.0689212Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:26:25.0689788Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:26:25.0925299Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:26:25.0926215Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:26:25.0959574Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:26:25.0960433Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:26:25.3759352Z ok (2.635s) 2022-05-18T05:26:25.3912029Z test_wrap_batchnorm_individually_use_or_policy_True (__main__.TestFSDPWrap) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 109505 2022-05-18T05:26:25.4038384Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 109506 2022-05-18T05:26:26.3235528Z dist init r=1, world=2 2022-05-18T05:26:26.3238662Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:26:26.3244434Z dist init r=0, world=2 2022-05-18T05:26:26.3248887Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:26:26.3250021Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:26:26.3342164Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:26:27.7178649Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:26:27.7179433Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:26:27.7404322Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:26:27.7405268Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:26:27.7406145Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:26:27.7406788Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:26:28.0110619Z ok (2.635s) 2022-05-18T05:26:28.0110811Z 2022-05-18T05:26:28.0111246Z ---------------------------------------------------------------------- 2022-05-18T05:26:28.0111598Z Ran 38 tests in 43.913s 2022-05-18T05:26:28.0111766Z 2022-05-18T05:26:28.0111863Z OK 2022-05-18T05:26:28.0111999Z 2022-05-18T05:26:28.0112121Z Generating XML reports... 2022-05-18T05:26:28.0146138Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T05:26:28.0147805Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T05:26:28.0149177Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T05:26:28.0150699Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T05:26:28.0151986Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T05:26:28.0153232Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T05:26:28.0154499Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T05:26:28.0155776Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T05:26:28.0157029Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T05:26:28.0158279Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T05:26:28.0159536Z [W python_variable.cpp:205] Warning: Deallocating Tensor that still has live PyObject references. This probably happened because you took out a weak reference to Tensor and didn't call _fix_weakref() after dereferencing it. Subsequent accesses to this tensor via the PyObject will now fail. (function concrete_decref_fn) 2022-05-18T05:26:28.0199124Z Generated XML report: test-reports/python-unittest/distributed.fsdp.test_wrap/TEST-TestAutoWrap-20220518052544.xml 2022-05-18T05:26:28.0218369Z Generated XML report: test-reports/python-unittest/distributed.fsdp.test_wrap/TEST-TestFSDPWrap-20220518052544.xml 2022-05-18T05:26:28.3060813Z Running distributed/algorithms/test_join ... [2022-05-18 05:26:28.305596] 2022-05-18T05:26:28.3061588Z Executing ['/opt/conda/bin/python', 'distributed/algorithms/test_join.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2022-05-18 05:26:28.305694] 2022-05-18T05:26:29.1955756Z Test results will be stored in test-reports/python-unittest/distributed.algorithms.test_join 2022-05-18T05:26:29.1973919Z 2022-05-18T05:26:29.1974177Z Running tests... 2022-05-18T05:26:29.1974620Z ---------------------------------------------------------------------- 2022-05-18T05:26:29.1985299Z test_join_kwargs (__main__.TestJoin) 2022-05-18T05:26:30.8555347Z Tests passing keyword arguments to the context manager. ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:26:30.8922575Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 109625 2022-05-18T05:26:30.9037727Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 109626 2022-05-18T05:26:31.7811228Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:26:31.7813937Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:26:31.7889876Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:26:31.7893622Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:26:31.7894441Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:26:31.7916669Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:26:34.5138136Z ok (5.316s) 2022-05-18T05:26:34.5147828Z test_multiple_joinable_disable (__main__.TestJoin) 2022-05-18T05:26:34.5275950Z Tests ``enable=False`` for multiple :class:`Joinable` s. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 109709 2022-05-18T05:26:34.5387807Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 109710 2022-05-18T05:26:35.4238015Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:26:35.4240194Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:26:35.4401351Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:26:35.4405190Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:26:35.4406040Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:26:35.4445460Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:26:38.0478406Z ok (3.534s) 2022-05-18T05:26:38.0488773Z test_multiple_joinables (__main__.TestJoin) 2022-05-18T05:26:38.0615435Z Tests the main hooks and post-hooks of multiple :class:`Joinable` s ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 109793 2022-05-18T05:26:38.0724790Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 109794 2022-05-18T05:26:38.9583833Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:26:38.9586014Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:26:38.9605965Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:26:38.9609290Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:26:38.9610461Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:26:38.9689279Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:26:41.6818416Z ok (3.634s) 2022-05-18T05:26:41.6826595Z test_multiple_joinables_throw (__main__.TestJoin) 2022-05-18T05:26:41.6950911Z Tests ``throw_on_early_termination=True`` for multiple ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 109877 2022-05-18T05:26:41.7059688Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 109878 2022-05-18T05:26:42.5933912Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:26:42.5936727Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:26:42.5947921Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:26:42.5951479Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:26:42.5952765Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:26:42.6039683Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:26:45.3155923Z ok (3.634s) 2022-05-18T05:26:45.3166156Z test_single_joinable (__main__.TestJoin) 2022-05-18T05:26:45.3292092Z Tests the main hooks and post-hooks of a single :class:`Joinable` ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 109961 2022-05-18T05:26:45.3405421Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 109962 2022-05-18T05:26:46.2203565Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:26:46.2206359Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:26:46.2211942Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:26:46.2216118Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:26:46.2217100Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:26:46.2309623Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:26:48.9500773Z ok (3.634s) 2022-05-18T05:26:48.9510591Z test_single_joinable_disable (__main__.TestJoin) 2022-05-18T05:26:48.9640682Z Tests ``enable=False`` for a single :class:`Joinable`. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 110045 2022-05-18T05:26:48.9756438Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 110046 2022-05-18T05:26:49.8603747Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:26:49.8605551Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:26:49.8865279Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:26:49.8869224Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:26:49.8870647Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:26:49.8913528Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:26:52.4850692Z ok (3.535s) 2022-05-18T05:26:52.4861864Z test_single_joinable_main_hooks (__main__.TestJoin) 2022-05-18T05:26:52.4990267Z Tests the main hooks of a single :class:`Joinable`. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 110129 2022-05-18T05:26:52.5101665Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 110130 2022-05-18T05:26:53.3911815Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:26:53.3913791Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:26:53.4269403Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:26:53.4273638Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:26:53.4275243Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:26:53.4323009Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:26:56.1196491Z ok (3.634s) 2022-05-18T05:26:56.1204497Z test_single_joinable_post_hooks (__main__.TestJoin) 2022-05-18T05:26:56.1333710Z Tests the post-hooks of a single :class:`Joinable`. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 110213 2022-05-18T05:26:56.1445377Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 110214 2022-05-18T05:26:57.0238047Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:26:57.0240631Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:26:57.0629234Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:26:57.0632569Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:26:57.0633408Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:26:57.0648571Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:26:59.7541099Z ok (3.634s) 2022-05-18T05:26:59.7549329Z test_single_joinable_throw (__main__.TestJoin) 2022-05-18T05:26:59.7677807Z Tests ``throw_on_early_termination=True`` for a single ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 110297 2022-05-18T05:26:59.7789733Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 110298 2022-05-18T05:27:00.6610215Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:27:00.6612910Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:27:00.7027180Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:27:00.7030983Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:27:00.7031794Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:27:00.7122759Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:27:03.3884463Z ok (3.634s) 2022-05-18T05:27:03.3884706Z 2022-05-18T05:27:03.3885089Z ---------------------------------------------------------------------- 2022-05-18T05:27:03.3887098Z Ran 9 tests in 34.191s 2022-05-18T05:27:03.3887295Z 2022-05-18T05:27:03.3887393Z OK 2022-05-18T05:27:03.3887539Z 2022-05-18T05:27:03.3889381Z Generating XML reports... 2022-05-18T05:27:03.3941594Z Generated XML report: test-reports/python-unittest/distributed.algorithms.test_join/TEST-TestJoin-20220518052629.xml 2022-05-18T05:27:03.6619239Z Running distributed/fsdp/test_fsdp_comm ... [2022-05-18 05:27:03.661480] 2022-05-18T05:27:03.6619993Z Executing ['/opt/conda/bin/python', 'distributed/fsdp/test_fsdp_comm.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2022-05-18 05:27:03.661578] 2022-05-18T05:27:04.6013095Z Test results will be stored in test-reports/python-unittest/distributed.fsdp.test_fsdp_comm 2022-05-18T05:27:04.6030510Z 2022-05-18T05:27:04.6030751Z Running tests... 2022-05-18T05:27:04.6031274Z ---------------------------------------------------------------------- 2022-05-18T05:27:04.6058482Z test_communication_nested_model_False_use_no_sync_False_sharding_strategy_None (__main__.TestCommunication) 2022-05-18T05:27:06.2620185Z Tests FSDP's communication cost in terms of calls to collective ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:27:06.2984868Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 110418 2022-05-18T05:27:06.3101716Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 110419 2022-05-18T05:27:07.2531396Z dist init r=0, world=2 2022-05-18T05:27:07.2535006Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:27:07.2549554Z dist init r=1, world=2 2022-05-18T05:27:07.2554056Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:27:07.2555077Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:27:07.2638433Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:27:08.6483487Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:27:08.6484022Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:27:08.6821638Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:27:08.6822323Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:27:08.6823169Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:27:08.6823794Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:27:09.6192802Z ok (5.016s) 2022-05-18T05:27:09.6220106Z test_communication_nested_model_False_use_no_sync_False_sharding_strategy_ShardingStrategy_SHARD_GRAD_OP (__main__.TestCommunication) 2022-05-18T05:27:09.6344001Z Tests FSDP's communication cost in terms of calls to collective ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 110505 2022-05-18T05:27:09.6454086Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 110506 2022-05-18T05:27:10.5702024Z dist init r=0, world=2 2022-05-18T05:27:10.5705568Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:27:10.5721929Z dist init r=1, world=2 2022-05-18T05:27:10.5726393Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:27:10.5737677Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:27:10.5808437Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:27:11.9503411Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:27:11.9822348Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:27:11.9823375Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:27:11.9824038Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:27:11.9824876Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:27:11.9825810Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:27:12.9543325Z ok (3.335s) 2022-05-18T05:27:12.9570355Z test_communication_nested_model_False_use_no_sync_True_sharding_strategy_None (__main__.TestCommunication) 2022-05-18T05:27:12.9701763Z Tests FSDP's communication cost in terms of calls to collective ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 110592 2022-05-18T05:27:12.9811847Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 110593 2022-05-18T05:27:13.8991128Z dist init r=1, world=2 2022-05-18T05:27:13.8994203Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:27:13.9336494Z dist init r=0, world=2 2022-05-18T05:27:13.9341065Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:27:13.9342229Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:27:13.9402571Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:27:15.3143064Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:27:15.3143581Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:27:15.3503751Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:27:15.3504474Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:27:15.3505340Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:27:15.3505961Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:27:16.2899766Z ok (3.336s) 2022-05-18T05:27:16.2926823Z test_communication_nested_model_False_use_no_sync_True_sharding_strategy_ShardingStrategy_SHARD_GRAD_OP (__main__.TestCommunication) 2022-05-18T05:27:16.3050646Z Tests FSDP's communication cost in terms of calls to collective ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 110679 2022-05-18T05:27:16.3159077Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 110680 2022-05-18T05:27:17.2059166Z dist init r=0, world=2 2022-05-18T05:27:17.2062689Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:27:17.2358377Z dist init r=1, world=2 2022-05-18T05:27:17.2363068Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:27:17.2364314Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:27:17.2369057Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:27:18.6096493Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:27:18.6097018Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:27:18.6464337Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:27:18.6465407Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:27:18.6466609Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:27:18.6467369Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:27:19.6248329Z ok (3.335s) 2022-05-18T05:27:19.6274824Z test_communication_nested_model_True_use_no_sync_False_sharding_strategy_None (__main__.TestCommunication) 2022-05-18T05:27:19.6399534Z Tests FSDP's communication cost in terms of calls to collective ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 110766 2022-05-18T05:27:19.6508659Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 110767 2022-05-18T05:27:20.5706326Z dist init r=1, world=2 2022-05-18T05:27:20.5709795Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:27:20.6108909Z dist init r=0, world=2 2022-05-18T05:27:20.6113623Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:27:20.6114869Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:27:20.6117788Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:27:21.9765777Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:27:21.9766456Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:27:21.9994247Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:27:21.9995013Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:27:21.9995866Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:27:21.9996517Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:27:22.5587208Z ok (2.934s) 2022-05-18T05:27:22.5613948Z test_communication_nested_model_True_use_no_sync_False_sharding_strategy_ShardingStrategy_SHARD_GRAD_OP (__main__.TestCommunication) 2022-05-18T05:27:22.5737189Z Tests FSDP's communication cost in terms of calls to collective ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 110853 2022-05-18T05:27:22.5844929Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 110854 2022-05-18T05:27:23.4792186Z dist init r=0, world=2 2022-05-18T05:27:23.4796630Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:27:23.5316690Z dist init r=1, world=2 2022-05-18T05:27:23.5321195Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:27:23.5322022Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:27:23.5407601Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:27:24.9037342Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:27:24.9037857Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:27:24.9234117Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:27:24.9235327Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:27:24.9236505Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:27:24.9237171Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:27:25.4923971Z ok (2.934s) 2022-05-18T05:27:25.4950891Z test_communication_nested_model_True_use_no_sync_True_sharding_strategy_None (__main__.TestCommunication) 2022-05-18T05:27:25.5073976Z Tests FSDP's communication cost in terms of calls to collective ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 110940 2022-05-18T05:27:25.5182914Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 110941 2022-05-18T05:27:26.4369153Z dist init r=0, world=2 2022-05-18T05:27:26.4373015Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:27:26.4670430Z dist init r=1, world=2 2022-05-18T05:27:26.4675292Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:27:26.4676070Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:27:26.4679650Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:27:27.8724459Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:27:27.8725008Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:27:27.8954485Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:27:27.8955320Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:27:27.8956236Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:27:27.8956948Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:27:28.5264112Z ok (3.034s) 2022-05-18T05:27:28.5291836Z test_communication_nested_model_True_use_no_sync_True_sharding_strategy_ShardingStrategy_SHARD_GRAD_OP (__main__.TestCommunication) 2022-05-18T05:27:28.5417538Z Tests FSDP's communication cost in terms of calls to collective ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 111027 2022-05-18T05:27:28.5526101Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 111028 2022-05-18T05:27:29.4790151Z dist init r=1, world=2 2022-05-18T05:27:29.4793791Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:27:29.5036323Z dist init r=0, world=2 2022-05-18T05:27:29.5040854Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:27:29.5041939Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:27:29.5100786Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:27:30.8938401Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:27:30.8939282Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:27:30.9156167Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:27:30.9156983Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:27:30.9157871Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:27:30.9158510Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:27:31.5606742Z ok (3.034s) 2022-05-18T05:27:31.5606973Z 2022-05-18T05:27:31.5607353Z ---------------------------------------------------------------------- 2022-05-18T05:27:31.5607731Z Ran 8 tests in 26.958s 2022-05-18T05:27:31.5607901Z 2022-05-18T05:27:31.5607998Z OK 2022-05-18T05:27:31.5610720Z 2022-05-18T05:27:31.5611083Z Generating XML reports... 2022-05-18T05:27:31.5666060Z Generated XML report: test-reports/python-unittest/distributed.fsdp.test_fsdp_comm/TEST-TestCommunication-20220518052704.xml 2022-05-18T05:27:31.8306512Z Running distributed/test_c10d_common ... [2022-05-18 05:27:31.830156] 2022-05-18T05:27:31.8307293Z Executing ['/opt/conda/bin/python', 'distributed/test_c10d_common.py', '-v', '--subprocess', '--import-slow-tests', '--import-disabled-tests'] ... [2022-05-18 05:27:31.830257] 2022-05-18T05:27:32.7453986Z test_debug_level (__main__.CommTest) 2022-05-18T05:27:32.7454468Z test_multi_limit_multi_dtype (__main__.ComputeBucketAssignmentTest) 2022-05-18T05:27:32.7455425Z test_multi_limit_single_dtype (__main__.ComputeBucketAssignmentTest) 2022-05-18T05:27:32.7456097Z test_single_limit_multi_dtype (__main__.ComputeBucketAssignmentTest) 2022-05-18T05:27:32.7456558Z test_single_limit_single_dtype (__main__.ComputeBucketAssignmentTest) 2022-05-18T05:27:32.7456995Z test_backend_class_attr (__main__.PythonProcessGroupExtensionTest) 2022-05-18T05:27:32.7457442Z test_collectives (__main__.PythonProcessGroupExtensionTest) 2022-05-18T05:27:32.7457877Z test_get_backend_name (__main__.PythonProcessGroupExtensionTest) 2022-05-18T05:27:32.7458287Z test_send_recv (__main__.PythonProcessGroupExtensionTest) 2022-05-18T05:27:33.6194914Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_common 2022-05-18T05:27:33.6208895Z 2022-05-18T05:27:33.6209400Z Running tests... 2022-05-18T05:27:33.6209974Z ---------------------------------------------------------------------- 2022-05-18T05:27:35.2335423Z test_debug_level (__main__.CommTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:27:35.2692952Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 111186 2022-05-18T05:27:35.2802848Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 111187 2022-05-18T05:27:36.1579527Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:27:36.1733204Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:27:36.3845475Z ok (2.763s) 2022-05-18T05:27:36.3845716Z 2022-05-18T05:27:36.3846142Z ---------------------------------------------------------------------- 2022-05-18T05:27:36.3846475Z Ran 1 test in 2.764s 2022-05-18T05:27:36.3846646Z 2022-05-18T05:27:36.3846743Z OK 2022-05-18T05:27:36.3846887Z 2022-05-18T05:27:36.3847029Z Generating XML reports... 2022-05-18T05:27:36.3890706Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_common/TEST-CommTest-20220518052733.xml 2022-05-18T05:27:37.5475085Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_common 2022-05-18T05:27:37.5490893Z 2022-05-18T05:27:37.5491385Z Running tests... 2022-05-18T05:27:37.5492286Z ---------------------------------------------------------------------- 2022-05-18T05:27:39.1928333Z test_multi_limit_multi_dtype (__main__.ComputeBucketAssignmentTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:27:39.2052423Z ok (1.656s) 2022-05-18T05:27:39.2053179Z 2022-05-18T05:27:39.2053957Z ---------------------------------------------------------------------- 2022-05-18T05:27:39.2054972Z Ran 1 test in 1.656s 2022-05-18T05:27:39.2055178Z 2022-05-18T05:27:39.2055277Z OK 2022-05-18T05:27:39.2055414Z 2022-05-18T05:27:39.2055526Z Generating XML reports... 2022-05-18T05:27:39.2086745Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_common/TEST-ComputeBucketAssignmentTest-20220518052737.xml 2022-05-18T05:27:40.3209177Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_common 2022-05-18T05:27:40.3224025Z 2022-05-18T05:27:40.3224424Z Running tests... 2022-05-18T05:27:40.3225110Z ---------------------------------------------------------------------- 2022-05-18T05:27:41.9554119Z test_multi_limit_single_dtype (__main__.ComputeBucketAssignmentTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:27:41.9675101Z ok (1.645s) 2022-05-18T05:27:41.9675553Z 2022-05-18T05:27:41.9676035Z ---------------------------------------------------------------------- 2022-05-18T05:27:41.9676417Z Ran 1 test in 1.645s 2022-05-18T05:27:41.9676592Z 2022-05-18T05:27:41.9676703Z OK 2022-05-18T05:27:41.9676825Z 2022-05-18T05:27:41.9676958Z Generating XML reports... 2022-05-18T05:27:41.9708397Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_common/TEST-ComputeBucketAssignmentTest-20220518052740.xml 2022-05-18T05:27:43.0951639Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_common 2022-05-18T05:27:43.0966721Z 2022-05-18T05:27:43.0967117Z Running tests... 2022-05-18T05:27:43.0967635Z ---------------------------------------------------------------------- 2022-05-18T05:27:44.7434689Z test_single_limit_multi_dtype (__main__.ComputeBucketAssignmentTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:27:44.7557505Z ok (1.659s) 2022-05-18T05:27:44.7558378Z 2022-05-18T05:27:44.7558795Z ---------------------------------------------------------------------- 2022-05-18T05:27:44.7559378Z Ran 1 test in 1.659s 2022-05-18T05:27:44.7559553Z 2022-05-18T05:27:44.7559653Z OK 2022-05-18T05:27:44.7559791Z 2022-05-18T05:27:44.7559938Z Generating XML reports... 2022-05-18T05:27:44.7591079Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_common/TEST-ComputeBucketAssignmentTest-20220518052743.xml 2022-05-18T05:27:45.8744312Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_common 2022-05-18T05:27:45.8759338Z 2022-05-18T05:27:45.8759595Z Running tests... 2022-05-18T05:27:45.8760050Z ---------------------------------------------------------------------- 2022-05-18T05:27:47.5234830Z test_single_limit_single_dtype (__main__.ComputeBucketAssignmentTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:27:47.5356666Z ok (1.660s) 2022-05-18T05:27:47.5357667Z 2022-05-18T05:27:47.5358056Z ---------------------------------------------------------------------- 2022-05-18T05:27:47.5358403Z Ran 1 test in 1.660s 2022-05-18T05:27:47.5358572Z 2022-05-18T05:27:47.5358671Z OK 2022-05-18T05:27:47.5358811Z 2022-05-18T05:27:47.5358945Z Generating XML reports... 2022-05-18T05:27:47.5390202Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_common/TEST-ComputeBucketAssignmentTest-20220518052745.xml 2022-05-18T05:27:48.6413152Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_common 2022-05-18T05:27:48.6427533Z 2022-05-18T05:27:48.6427929Z Running tests... 2022-05-18T05:27:48.6428434Z ---------------------------------------------------------------------- 2022-05-18T05:27:50.2763912Z test_backend_class_attr (__main__.PythonProcessGroupExtensionTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:27:50.3119334Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 111439 2022-05-18T05:27:50.3230156Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 111440 2022-05-18T05:27:50.3341401Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 111441 2022-05-18T05:27:50.3453238Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 111442 2022-05-18T05:27:51.2751686Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:27:51.2756462Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:27:51.2888024Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:27:51.3278811Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:27:51.5502144Z ok (2.907s) 2022-05-18T05:27:51.5502377Z 2022-05-18T05:27:51.5502781Z ---------------------------------------------------------------------- 2022-05-18T05:27:51.5503129Z Ran 1 test in 2.907s 2022-05-18T05:27:51.5503307Z 2022-05-18T05:27:51.5503407Z OK 2022-05-18T05:27:51.5503549Z 2022-05-18T05:27:51.5503690Z Generating XML reports... 2022-05-18T05:27:51.5547858Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_common/TEST-PythonProcessGroupExtensionTest-20220518052748.xml 2022-05-18T05:27:52.6690813Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_common 2022-05-18T05:27:52.6705860Z 2022-05-18T05:27:52.6706311Z Running tests... 2022-05-18T05:27:52.6706814Z ---------------------------------------------------------------------- 2022-05-18T05:27:54.3184005Z test_collectives (__main__.PythonProcessGroupExtensionTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:27:54.3539136Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 111620 2022-05-18T05:27:54.3647549Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 111621 2022-05-18T05:27:54.3758703Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 111622 2022-05-18T05:27:54.3868856Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 111623 2022-05-18T05:27:55.2832118Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:27:55.3042723Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:27:55.3161586Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:27:55.3171810Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 3 2022-05-18T05:27:55.3220513Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:27:55.3230017Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:27:56.2846015Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 2 2022-05-18T05:27:56.2877383Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:27:56.2878523Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2022-05-18T05:27:56.2922714Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2022-05-18T05:27:56.2948926Z INFO:torch.distributed.distributed_c10d:Rank 2: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2022-05-18T05:27:56.2971212Z INFO:torch.distributed.distributed_c10d:Rank 3: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2022-05-18T05:27:58.0985666Z ok (5.428s) 2022-05-18T05:27:58.0986028Z 2022-05-18T05:27:58.0986768Z ---------------------------------------------------------------------- 2022-05-18T05:27:58.0987616Z Ran 1 test in 5.428s 2022-05-18T05:27:58.0987784Z 2022-05-18T05:27:58.0987869Z OK 2022-05-18T05:27:58.0988007Z 2022-05-18T05:27:58.0988145Z Generating XML reports... 2022-05-18T05:27:58.1031168Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_common/TEST-PythonProcessGroupExtensionTest-20220518052752.xml 2022-05-18T05:27:59.2856875Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_common 2022-05-18T05:27:59.2871150Z 2022-05-18T05:27:59.2871616Z Running tests... 2022-05-18T05:27:59.2872507Z ---------------------------------------------------------------------- 2022-05-18T05:28:00.9396652Z test_get_backend_name (__main__.PythonProcessGroupExtensionTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:28:00.9755160Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 111810 2022-05-18T05:28:00.9864941Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 111811 2022-05-18T05:28:00.9976485Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 111812 2022-05-18T05:28:01.0087957Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 111813 2022-05-18T05:28:01.9135819Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:28:01.9136884Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:28:01.9159399Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:28:01.9299948Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:28:02.1133190Z ok (2.826s) 2022-05-18T05:28:02.1133619Z 2022-05-18T05:28:02.1134154Z ---------------------------------------------------------------------- 2022-05-18T05:28:02.1134736Z Ran 1 test in 2.826s 2022-05-18T05:28:02.1134920Z 2022-05-18T05:28:02.1135016Z OK 2022-05-18T05:28:02.1135137Z 2022-05-18T05:28:02.1135272Z Generating XML reports... 2022-05-18T05:28:02.1178996Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_common/TEST-PythonProcessGroupExtensionTest-20220518052759.xml 2022-05-18T05:28:03.2650362Z Test results will be stored in test-reports/python-unittest/distributed.test_c10d_common 2022-05-18T05:28:03.2664575Z 2022-05-18T05:28:03.2664724Z Running tests... 2022-05-18T05:28:03.2665723Z ---------------------------------------------------------------------- 2022-05-18T05:28:04.8783714Z test_send_recv (__main__.PythonProcessGroupExtensionTest) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:28:04.9146202Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 111991 2022-05-18T05:28:04.9258687Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 111992 2022-05-18T05:28:04.9370027Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 111993 2022-05-18T05:28:04.9484524Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 111994 2022-05-18T05:28:05.8434567Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:28:05.8542347Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:28:05.8552529Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 3 2022-05-18T05:28:05.8672722Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:28:05.8682408Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:28:05.8898398Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:28:05.8907589Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 2 2022-05-18T05:28:05.8957630Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:28:05.8959403Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2022-05-18T05:28:05.8963098Z INFO:torch.distributed.distributed_c10d:Rank 3: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2022-05-18T05:28:05.8991290Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2022-05-18T05:28:05.9012188Z INFO:torch.distributed.distributed_c10d:Rank 2: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. 2022-05-18T05:28:07.8574483Z ok (4.591s) 2022-05-18T05:28:07.8574724Z 2022-05-18T05:28:07.8575135Z ---------------------------------------------------------------------- 2022-05-18T05:28:07.8575497Z Ran 1 test in 4.591s 2022-05-18T05:28:07.8575670Z 2022-05-18T05:28:07.8578792Z OK 2022-05-18T05:28:07.8579156Z 2022-05-18T05:28:07.8579530Z Generating XML reports... 2022-05-18T05:28:07.8621473Z Generated XML report: test-reports/python-unittest/distributed.test_c10d_common/TEST-PythonProcessGroupExtensionTest-20220518052803.xml 2022-05-18T05:28:08.2655878Z Running distributed/fsdp/test_fsdp_meta ... [2022-05-18 05:28:08.265069] 2022-05-18T05:28:08.2656597Z Executing ['/opt/conda/bin/python', 'distributed/fsdp/test_fsdp_meta.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2022-05-18 05:28:08.265171] 2022-05-18T05:28:09.2089126Z Test results will be stored in test-reports/python-unittest/distributed.fsdp.test_fsdp_meta 2022-05-18T05:28:09.2107618Z 2022-05-18T05:28:09.2108119Z Running tests... 2022-05-18T05:28:09.2108608Z ---------------------------------------------------------------------- 2022-05-18T05:28:10.8513062Z test_bad_arg_meta (__main__.TestFSDPWithMetaDevice) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:28:10.8889944Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 112181 2022-05-18T05:28:10.9005532Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 112182 2022-05-18T05:28:11.8157636Z dist init r=1, world=2 2022-05-18T05:28:11.8161059Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:28:11.8183959Z dist init r=0, world=2 2022-05-18T05:28:11.8189031Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:28:11.8190040Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:28:11.8264143Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:28:13.2100375Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:28:13.2100922Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:28:13.5079520Z ok (4.297s) 2022-05-18T05:28:13.5087480Z test_bad_arg_torchdistx (__main__.TestFSDPWithMetaDevice) ... skip: Test requires torchdistX: https://github.com/pytorch/torchdistX (0.001s) 2022-05-18T05:28:13.5216978Z test_nested_model_with_meta_device_default_init_auto_wrap_False (__main__.TestFSDPWithMetaDevice) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 112264 2022-05-18T05:28:13.5325247Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 112265 2022-05-18T05:28:14.4451246Z dist init r=1, world=2 2022-05-18T05:28:14.4454693Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:28:14.4880046Z dist init r=0, world=2 2022-05-18T05:28:14.4884441Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:28:14.4885615Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:28:14.4964743Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:28:15.8795658Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:28:15.8796206Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:28:16.5403463Z ok (3.031s) 2022-05-18T05:28:16.5536636Z test_nested_model_with_meta_device_default_init_auto_wrap_True (__main__.TestFSDPWithMetaDevice) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 112351 2022-05-18T05:28:16.5644878Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 112352 2022-05-18T05:28:17.4917741Z dist init r=0, world=2 2022-05-18T05:28:17.4921026Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:28:17.4961794Z dist init r=1, world=2 2022-05-18T05:28:17.4966785Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:28:17.4968083Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:28:17.5024496Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:28:18.8926705Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:28:18.8927248Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:28:19.5724764Z ok (3.032s) 2022-05-18T05:28:19.5859095Z test_nested_model_with_meta_device_reset_params_auto_wrap_False (__main__.TestFSDPWithMetaDevice) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 112438 2022-05-18T05:28:19.5972083Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 112439 2022-05-18T05:28:20.5139457Z dist init r=0, world=2 2022-05-18T05:28:20.5142737Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:28:20.5334998Z dist init r=1, world=2 2022-05-18T05:28:20.5339420Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:28:20.5340378Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:28:20.5347649Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:28:21.9309911Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:28:21.9310453Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:28:22.6052869Z ok (3.033s) 2022-05-18T05:28:22.6186646Z test_nested_model_with_meta_device_reset_params_auto_wrap_True (__main__.TestFSDPWithMetaDevice) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 112525 2022-05-18T05:28:22.6297057Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 112526 2022-05-18T05:28:23.5788647Z dist init r=1, world=2 2022-05-18T05:28:23.5791870Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:28:23.5876810Z dist init r=0, world=2 2022-05-18T05:28:23.5881335Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:28:23.5882442Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:28:23.5894190Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:28:24.9402570Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:28:24.9403136Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:28:25.5375746Z ok (2.932s) 2022-05-18T05:28:25.5384164Z test_nested_model_with_torchdistX_default_init_auto_wrap_False (__main__.TestFSDPWithMetaDevice) ... skip: Test requires torchdistX: https://github.com/pytorch/torchdistX (0.001s) 2022-05-18T05:28:25.5391008Z test_nested_model_with_torchdistX_default_init_auto_wrap_True (__main__.TestFSDPWithMetaDevice) ... skip: Test requires torchdistX: https://github.com/pytorch/torchdistX (0.001s) 2022-05-18T05:28:25.5396819Z test_nested_model_with_torchdistX_init_fn_auto_wrap_False (__main__.TestFSDPWithMetaDevice) ... skip: Test requires torchdistX: https://github.com/pytorch/torchdistX (0.001s) 2022-05-18T05:28:25.5403315Z test_nested_model_with_torchdistX_init_fn_auto_wrap_True (__main__.TestFSDPWithMetaDevice) ... skip: Test requires torchdistX: https://github.com/pytorch/torchdistX (0.001s) 2022-05-18T05:28:25.5531477Z test_simple_model_with_meta_device_default_init (__main__.TestFSDPWithMetaDevice) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 112612 2022-05-18T05:28:25.5642812Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 112613 2022-05-18T05:28:26.4837583Z dist init r=0, world=2 2022-05-18T05:28:26.4841157Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:28:26.5292589Z dist init r=1, world=2 2022-05-18T05:28:26.5297441Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:28:26.5298597Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:28:26.5351193Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:28:27.9254434Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:28:27.9254991Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:28:28.5724569Z ok (3.032s) 2022-05-18T05:28:28.5861643Z test_simple_model_with_meta_device_reset_params (__main__.TestFSDPWithMetaDevice) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 112699 2022-05-18T05:28:28.5975856Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 112700 2022-05-18T05:28:29.5118760Z dist init r=0, world=2 2022-05-18T05:28:29.5121829Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:28:29.5178603Z dist init r=1, world=2 2022-05-18T05:28:29.5183452Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:28:29.5184500Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:28:29.5225779Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:28:30.8917167Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:28:30.8917959Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:28:31.5054692Z ok (2.933s) 2022-05-18T05:28:31.5062515Z test_simple_model_with_torchdistX_default_init (__main__.TestFSDPWithMetaDevice) ... skip: Test requires torchdistX: https://github.com/pytorch/torchdistX (0.001s) 2022-05-18T05:28:31.5067942Z test_simple_model_with_torchdistX_init_fn (__main__.TestFSDPWithMetaDevice) ... skip: Test requires torchdistX: https://github.com/pytorch/torchdistX (0.000s) 2022-05-18T05:28:31.5068671Z 2022-05-18T05:28:31.5069075Z ---------------------------------------------------------------------- 2022-05-18T05:28:31.5072010Z Ran 14 tests in 22.296s 2022-05-18T05:28:31.5072256Z 2022-05-18T05:28:31.5072827Z OK (skipped=7) 2022-05-18T05:28:31.5073283Z 2022-05-18T05:28:31.5073425Z Generating XML reports... 2022-05-18T05:28:31.5131199Z Generated XML report: test-reports/python-unittest/distributed.fsdp.test_fsdp_meta/TEST-TestFSDPWithMetaDevice-20220518052809.xml 2022-05-18T05:28:31.7860526Z Running distributed/fsdp/test_fsdp_misc ... [2022-05-18 05:28:31.785566] 2022-05-18T05:28:31.7861492Z Executing ['/opt/conda/bin/python', 'distributed/fsdp/test_fsdp_misc.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2022-05-18 05:28:31.785666] 2022-05-18T05:28:32.7289120Z Test results will be stored in test-reports/python-unittest/distributed.fsdp.test_fsdp_misc 2022-05-18T05:28:32.7313760Z 2022-05-18T05:28:32.7314117Z Running tests... 2022-05-18T05:28:32.7314630Z ---------------------------------------------------------------------- 2022-05-18T05:28:32.7325010Z test_device_id_auto_wrap (__main__.TestFSDPMisc) 2022-05-18T05:28:34.3708037Z Test auto wrapping propagates the device id. ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:28:34.4083411Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 112823 2022-05-18T05:28:34.4200296Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 112824 2022-05-18T05:28:35.3754238Z dist init r=1, world=2 2022-05-18T05:28:35.3757485Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:28:35.3948836Z dist init r=0, world=2 2022-05-18T05:28:35.3953783Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:28:35.3954559Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:28:35.3962048Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:28:36.7838389Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:28:36.7838968Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:28:37.1278596Z ok (4.396s) 2022-05-18T05:28:37.1293369Z test_fsdp_cpu_init_stays_on_cpu (__main__.TestFSDPMisc) 2022-05-18T05:28:37.1425691Z Ensure that CPU model input stays on CPU ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 112906 2022-05-18T05:28:37.1541051Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 112907 2022-05-18T05:28:38.0730044Z dist init r=0, world=2 2022-05-18T05:28:38.0733665Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:28:38.1225122Z dist init r=1, world=2 2022-05-18T05:28:38.1229506Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:28:38.1230387Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:28:38.1242929Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:28:39.4892903Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:28:39.4893463Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:28:40.1622019Z ok (3.034s) 2022-05-18T05:28:40.1645341Z test_fsdp_device_id_use_index_False (__main__.TestFSDPMisc) 2022-05-18T05:28:40.1775049Z If CPU module is passed into FSDP with device_id ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 112993 2022-05-18T05:28:40.1888850Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 112994 2022-05-18T05:28:41.1009458Z dist init r=1, world=2 2022-05-18T05:28:41.1012981Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:28:41.1283262Z dist init r=0, world=2 2022-05-18T05:28:41.1287646Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:28:41.1288463Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:28:41.1319617Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:28:42.5194038Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:28:42.5194618Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:28:42.8963641Z ok (2.734s) 2022-05-18T05:28:42.8986644Z test_fsdp_device_id_use_index_True (__main__.TestFSDPMisc) 2022-05-18T05:28:42.9111718Z If CPU module is passed into FSDP with device_id ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 113076 2022-05-18T05:28:42.9222925Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 113077 2022-05-18T05:28:43.8607286Z dist init r=0, world=2 2022-05-18T05:28:43.8610347Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:28:43.8627915Z dist init r=1, world=2 2022-05-18T05:28:43.8632555Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:28:43.8634339Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:28:43.8713519Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:28:45.2340468Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:28:45.2340995Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:28:45.5295675Z ok (2.633s) 2022-05-18T05:28:45.5308688Z test_fsdp_same_model_across_ranks (__main__.TestFSDPMisc) 2022-05-18T05:28:45.5434151Z FSDP broadcasts model from rank 0 to ensure it starts off with the same ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 113159 2022-05-18T05:28:45.5545362Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 113160 2022-05-18T05:28:46.4754970Z dist init r=0, world=2 2022-05-18T05:28:46.4758333Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:28:46.5162446Z dist init r=1, world=2 2022-05-18T05:28:46.5167748Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:28:46.5169031Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:28:46.5170582Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:28:47.8954700Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:28:47.8955305Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:28:48.2619971Z ok (2.732s) 2022-05-18T05:28:48.2629041Z test_module_device_mismatches_device_id (__main__.TestFSDPMisc) 2022-05-18T05:28:48.2756179Z FSDP raises errors when module is on a GPU that does ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 113242 2022-05-18T05:28:48.2867397Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 113243 2022-05-18T05:28:49.1838057Z dist init r=1, world=2 2022-05-18T05:28:49.1841549Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:28:49.1993151Z dist init r=0, world=2 2022-05-18T05:28:49.1997707Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:28:49.1998772Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:28:49.2046769Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:28:50.5618032Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:28:50.5618848Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:28:50.8938150Z ok (2.632s) 2022-05-18T05:28:50.8947289Z test_multi_device_not_supported (__main__.TestFSDPMisc) 2022-05-18T05:28:50.9077684Z FSDP throws appropriate error when we wrap multi-device module. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 113325 2022-05-18T05:28:50.9192686Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 113326 2022-05-18T05:28:51.8782828Z dist init r=1, world=2 2022-05-18T05:28:51.8786330Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:28:51.8943692Z dist init r=0, world=2 2022-05-18T05:28:51.8948465Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:28:51.8949254Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:28:51.8991526Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:28:53.2887834Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:28:53.2888370Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:28:53.6265905Z ok (2.733s) 2022-05-18T05:28:53.6277515Z test_no_params (__main__.TestFSDPMisc) 2022-05-18T05:28:53.6405402Z Test that device_id and cpu init work if module has no params ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 113408 2022-05-18T05:28:53.6517501Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 113409 2022-05-18T05:28:54.5633515Z dist init r=0, world=2 2022-05-18T05:28:54.5636602Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:28:54.5816921Z dist init r=1, world=2 2022-05-18T05:28:54.5821435Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:28:54.5822652Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:28:54.5841317Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:28:55.9754476Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:28:55.9755014Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:28:56.2589780Z ok (2.632s) 2022-05-18T05:28:56.2589967Z 2022-05-18T05:28:56.2590345Z ---------------------------------------------------------------------- 2022-05-18T05:28:56.2590687Z Ran 8 tests in 23.528s 2022-05-18T05:28:56.2592138Z 2022-05-18T05:28:56.2592340Z OK 2022-05-18T05:28:56.2592503Z 2022-05-18T05:28:56.2592660Z Generating XML reports... 2022-05-18T05:28:56.2645864Z Generated XML report: test-reports/python-unittest/distributed.fsdp.test_fsdp_misc/TEST-TestFSDPMisc-20220518052832.xml 2022-05-18T05:28:56.5291121Z Running distributed/_shard/checkpoint/test_checkpoint ... [2022-05-18 05:28:56.528632] 2022-05-18T05:28:56.5291924Z Executing ['/opt/conda/bin/python', 'distributed/_shard/checkpoint/test_checkpoint.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2022-05-18 05:28:56.528725] 2022-05-18T05:28:57.4674801Z Test results will be stored in test-reports/python-unittest/distributed._shard.checkpoint.test_checkpoint 2022-05-18T05:28:57.4694040Z 2022-05-18T05:28:57.4694293Z Running tests... 2022-05-18T05:28:57.4694722Z ---------------------------------------------------------------------- 2022-05-18T05:28:59.1411736Z test_checkpoint_has_shard_overlap (__main__.TestDistributedCheckpointing) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:28:59.1782203Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 113528 2022-05-18T05:28:59.1896857Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 113529 2022-05-18T05:29:00.1012687Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:29:00.1016547Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:29:00.1250644Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:29:00.1256050Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:29:00.1256892Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:29:00.1323369Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:29:01.7970599Z ok (4.327s) 2022-05-18T05:29:01.8107741Z test_checkpoint_has_shard_too_small (__main__.TestDistributedCheckpointing) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 113611 2022-05-18T05:29:01.8217796Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 113612 2022-05-18T05:29:02.7129119Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:29:02.7133112Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:29:02.7254187Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:29:02.7259101Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:29:02.7260415Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:29:02.7337986Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:29:04.4290539Z ok (2.632s) 2022-05-18T05:29:04.4429054Z test_checkpoint_has_storage_type_mismatch (__main__.TestDistributedCheckpointing) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 113694 2022-05-18T05:29:04.4540671Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 113695 2022-05-18T05:29:05.3404655Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:29:05.3408186Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:29:05.3564123Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:29:05.3568793Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:29:05.3570076Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:29:05.3614185Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:29:07.0612483Z ok (2.632s) 2022-05-18T05:29:07.0758797Z test_storage_key_mapping (__main__.TestDistributedCheckpointing) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 113777 2022-05-18T05:29:07.0868549Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 113778 2022-05-18T05:29:08.0000072Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:29:08.0003453Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:29:08.0348852Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:29:08.0354156Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:29:08.0355489Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:29:08.0411843Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:29:09.6939282Z ok (2.633s) 2022-05-18T05:29:09.7073277Z test_tensor_metadata_with_missing_rank_spec (__main__.TestDistributedCheckpointing) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 113860 2022-05-18T05:29:09.7181482Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 113861 2022-05-18T05:29:10.6310003Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:29:10.6312976Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:29:10.6342793Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:29:10.6347428Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:29:10.6348442Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:29:10.6416267Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:29:12.3252517Z ok (2.631s) 2022-05-18T05:29:12.3397241Z test_validate_metadata (__main__.TestDistributedCheckpointing) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 113943 2022-05-18T05:29:12.3510487Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 113944 2022-05-18T05:29:13.2590561Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:29:13.2591348Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:29:13.2595104Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:29:13.2596549Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:29:13.2597656Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:29:13.2698445Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:29:14.9584399Z ok (2.633s) 2022-05-18T05:29:14.9608362Z test_create_key_handles_collision (__main__.TestStorageKeys) ... ok (0.002s) 2022-05-18T05:29:14.9610418Z 2022-05-18T05:29:14.9611355Z ---------------------------------------------------------------------- 2022-05-18T05:29:14.9611790Z Ran 7 tests in 17.492s 2022-05-18T05:29:14.9611972Z 2022-05-18T05:29:14.9612066Z OK 2022-05-18T05:29:14.9612210Z 2022-05-18T05:29:14.9612350Z Generating XML reports... 2022-05-18T05:29:14.9673759Z Generated XML report: test-reports/python-unittest/distributed._shard.checkpoint.test_checkpoint/TEST-TestDistributedCheckpointing-20220518052857.xml 2022-05-18T05:29:14.9677253Z Generated XML report: test-reports/python-unittest/distributed._shard.checkpoint.test_checkpoint/TEST-TestStorageKeys-20220518052857.xml 2022-05-18T05:29:15.2304824Z Running distributed/fsdp/test_fsdp_checkpoint ... [2022-05-18 05:29:15.230024] 2022-05-18T05:29:15.2305571Z Executing ['/opt/conda/bin/python', 'distributed/fsdp/test_fsdp_checkpoint.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2022-05-18 05:29:15.230120] 2022-05-18T05:29:16.1745990Z Test results will be stored in test-reports/python-unittest/distributed.fsdp.test_fsdp_checkpoint 2022-05-18T05:29:16.1762677Z 2022-05-18T05:29:16.1762869Z Running tests... 2022-05-18T05:29:16.1763325Z ---------------------------------------------------------------------- 2022-05-18T05:29:17.8423011Z test_basic_checkpoint_end_to_end_cpu_offload_CPUOffload(offload_params=False)_offload_activations_False (__main__.TestFSDPCheckpoint) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:29:17.8799451Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 114063 2022-05-18T05:29:17.8914519Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 114064 2022-05-18T05:29:18.8290044Z dist init r=0, world=2 2022-05-18T05:29:18.8293883Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:29:18.8508739Z dist init r=1, world=2 2022-05-18T05:29:18.8513056Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:29:18.8514240Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:29:18.8600096Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:29:20.2585121Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:29:20.2585666Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:29:20.9999749Z ok (4.823s) 2022-05-18T05:29:21.0032591Z test_basic_checkpoint_end_to_end_cpu_offload_CPUOffload(offload_params=False)_offload_activations_True (__main__.TestFSDPCheckpoint) ... skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/71418 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure IN_CI is not set and you are not using the flag --import-disabled-tests. (0.003s) 2022-05-18T05:29:21.0186389Z test_basic_checkpoint_end_to_end_cpu_offload_CPUOffload(offload_params=True)_offload_activations_False (__main__.TestFSDPCheckpoint) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 114150 2022-05-18T05:29:21.0298474Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 114151 2022-05-18T05:29:21.9399235Z dist init r=1, world=2 2022-05-18T05:29:21.9402542Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:29:21.9584924Z dist init r=0, world=2 2022-05-18T05:29:21.9589384Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:29:21.9590481Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:29:21.9607254Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:29:23.3449763Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:29:23.3450659Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:29:24.0381025Z ok (3.035s) 2022-05-18T05:29:24.0416628Z test_basic_checkpoint_end_to_end_cpu_offload_CPUOffload(offload_params=True)_offload_activations_True (__main__.TestFSDPCheckpoint) ... skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/70368 for platform(s) win, linux. If you're seeing this on your local machine and would like to enable this test, please make sure IN_CI is not set and you are not using the flag --import-disabled-tests. (0.003s) 2022-05-18T05:29:24.0559465Z test_checkpoint_fsdp_wrapping_cpu_offload_CPUOffload(offload_params=False)_offload_activations_False (__main__.TestFSDPCheckpoint) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 114237 2022-05-18T05:29:24.0668291Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 114238 2022-05-18T05:29:24.9809907Z dist init r=1, world=2 2022-05-18T05:29:24.9814089Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:29:25.0186873Z dist init r=0, world=2 2022-05-18T05:29:25.0190782Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:29:25.0191577Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:29:25.0222163Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:29:26.4005916Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:29:26.4006518Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:29:27.0750790Z ok (3.033s) 2022-05-18T05:29:27.0777377Z test_checkpoint_fsdp_wrapping_cpu_offload_CPUOffload(offload_params=False)_offload_activations_True (__main__.TestFSDPCheckpoint) ... skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/71009 for allplatform(s) . If you're seeing this on your local machine and would like to enable this test, please make sure IN_CI is not set and you are not using the flag --import-disabled-tests. (0.003s) 2022-05-18T05:29:27.0919471Z test_checkpoint_fsdp_wrapping_cpu_offload_CPUOffload(offload_params=True)_offload_activations_False (__main__.TestFSDPCheckpoint) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 114324 2022-05-18T05:29:27.1028232Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 114325 2022-05-18T05:29:28.0174182Z dist init r=1, world=2 2022-05-18T05:29:28.0178077Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:29:28.0461012Z dist init r=0, world=2 2022-05-18T05:29:28.0465335Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:29:28.0466414Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:29:28.0484475Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:29:29.4288120Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:29:29.4288644Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:29:30.1109835Z ok (3.033s) 2022-05-18T05:29:30.1136842Z test_checkpoint_fsdp_wrapping_cpu_offload_CPUOffload(offload_params=True)_offload_activations_True (__main__.TestFSDPCheckpoint) ... skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/71349 for allplatform(s) . If you're seeing this on your local machine and would like to enable this test, please make sure IN_CI is not set and you are not using the flag --import-disabled-tests. (0.003s) 2022-05-18T05:29:30.1137596Z 2022-05-18T05:29:30.1137890Z ---------------------------------------------------------------------- 2022-05-18T05:29:30.1138235Z Ran 8 tests in 13.937s 2022-05-18T05:29:30.1138385Z 2022-05-18T05:29:30.1138496Z OK (skipped=4) 2022-05-18T05:29:30.1138654Z 2022-05-18T05:29:30.1138787Z Generating XML reports... 2022-05-18T05:29:30.1190761Z Generated XML report: test-reports/python-unittest/distributed.fsdp.test_fsdp_checkpoint/TEST-TestFSDPCheckpoint-20220518052916.xml 2022-05-18T05:29:30.3871605Z Running distributed/_shard/checkpoint/test_file_system_checkpoint ... [2022-05-18 05:29:30.386631] 2022-05-18T05:29:30.3872735Z Executing ['/opt/conda/bin/python', 'distributed/_shard/checkpoint/test_file_system_checkpoint.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2022-05-18 05:29:30.386730] 2022-05-18T05:29:31.3286164Z Test results will be stored in test-reports/python-unittest/distributed._shard.checkpoint.test_file_system_checkpoint 2022-05-18T05:29:31.3306472Z 2022-05-18T05:29:31.3306917Z Running tests... 2022-05-18T05:29:31.3307636Z ---------------------------------------------------------------------- 2022-05-18T05:29:32.9735139Z test_load_rowwise_to_colwise (__main__.TestDistributedReshardOnLoad) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:29:33.0102758Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 114448 2022-05-18T05:29:33.0221249Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 114449 2022-05-18T05:29:33.9725146Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:29:33.9728830Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:29:33.9921559Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:29:33.9926396Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:29:33.9927240Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:29:33.9933405Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:29:35.7298867Z ok (4.399s) 2022-05-18T05:29:35.7463606Z test_load_with_different_shard_plan (__main__.TestDistributedReshardOnLoad) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 114531 2022-05-18T05:29:35.7578847Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 114532 2022-05-18T05:29:36.6748529Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:29:36.6751996Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:29:36.7190685Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:29:36.7195604Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:29:36.7196655Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:29:36.7260934Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:29:38.4653941Z ok (2.735s) 2022-05-18T05:29:38.4789204Z test_save_load_bytes (__main__.TestDistributedReshardOnLoad) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 114614 2022-05-18T05:29:38.4899666Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 114615 2022-05-18T05:29:39.4352667Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:29:39.4356134Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:29:39.4464624Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:29:39.4469841Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:29:39.4470667Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:29:39.4561212Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:29:41.0971980Z ok (2.632s) 2022-05-18T05:29:41.1250785Z test_read_write_only_tensor (__main__.TestDistributedStateDictSaveLoad) ... ok (0.028s) 2022-05-18T05:29:41.1388327Z test_read_write_shard_tensor (__main__.TestDistributedStateDictSaveLoadWithSharedTensor) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 114697 2022-05-18T05:29:41.1497545Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 114698 2022-05-18T05:29:42.0455056Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:29:42.0458582Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:29:42.0526201Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:29:42.0531769Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:29:42.0533025Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:29:42.0561346Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:29:43.7571556Z ok (2.632s) 2022-05-18T05:29:43.7571923Z 2022-05-18T05:29:43.7572352Z ---------------------------------------------------------------------- 2022-05-18T05:29:43.7572696Z Ran 5 tests in 12.426s 2022-05-18T05:29:43.7572868Z 2022-05-18T05:29:43.7572967Z OK 2022-05-18T05:29:43.7574441Z 2022-05-18T05:29:43.7575071Z Generating XML reports... 2022-05-18T05:29:43.7633270Z Generated XML report: test-reports/python-unittest/distributed._shard.checkpoint.test_file_system_checkpoint/TEST-TestDistributedReshardOnLoad-20220518052931.xml 2022-05-18T05:29:43.7637285Z Generated XML report: test-reports/python-unittest/distributed._shard.checkpoint.test_file_system_checkpoint/TEST-TestDistributedStateDictSaveLoad-20220518052931.xml 2022-05-18T05:29:43.7641802Z Generated XML report: test-reports/python-unittest/distributed._shard.checkpoint.test_file_system_checkpoint/TEST-TestDistributedStateDictSaveLoadWithSharedTensor-20220518052931.xml 2022-05-18T05:29:44.0259550Z Running distributed/fsdp/test_fsdp_apply ... [2022-05-18 05:29:44.025442] 2022-05-18T05:29:44.0260325Z Executing ['/opt/conda/bin/python', 'distributed/fsdp/test_fsdp_apply.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2022-05-18 05:29:44.025544] 2022-05-18T05:29:44.9447273Z Test results will be stored in test-reports/python-unittest/distributed.fsdp.test_fsdp_apply 2022-05-18T05:29:44.9470845Z 2022-05-18T05:29:44.9471107Z Running tests... 2022-05-18T05:29:44.9471573Z ---------------------------------------------------------------------- 2022-05-18T05:29:44.9479613Z test_apply_in_summon_raises_error (__main__.TestApply) 2022-05-18T05:29:46.6070108Z Ensures that if user calls apply() on FSDP instance within full param ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:29:46.6443797Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 114817 2022-05-18T05:29:46.6559755Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 114818 2022-05-18T05:29:47.5602035Z dist init r=1, world=2 2022-05-18T05:29:47.5605249Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:29:47.5634493Z dist init r=0, world=2 2022-05-18T05:29:47.5639080Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:29:47.5640295Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:29:47.5708433Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:29:48.9417503Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:29:48.9418031Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:29:48.9745512Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:29:48.9746190Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:29:48.9776113Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:29:48.9776782Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:29:48.9861198Z Asserting FSDP instance is: FullyShardedDataParallel( 2022-05-18T05:29:48.9861614Z (_fsdp_wrapped_module): FlattenParamsWrapper( 2022-05-18T05:29:48.9861980Z (_fpw_module): TransformerWithSharedParams( 2022-05-18T05:29:48.9862316Z (embed_tokens): Embedding(23, 16) 2022-05-18T05:29:48.9862605Z (transformer): Transformer( 2022-05-18T05:29:48.9862903Z (encoder): TransformerEncoder( 2022-05-18T05:29:48.9863190Z (layers): ModuleList( 2022-05-18T05:29:48.9864004Z (0): TransformerEncoderLayer( 2022-05-18T05:29:48.9864459Z (self_attn): MultiheadAttention( 2022-05-18T05:29:48.9865081Z (out_proj): NonDynamicallyQuantizableLinear(in_features=16, out_features=16, bias=True) 2022-05-18T05:29:48.9865726Z ) 2022-05-18T05:29:48.9866148Z (linear1): Linear(in_features=16, out_features=8, bias=True) 2022-05-18T05:29:48.9866543Z (dropout): Dropout(p=0.1, inplace=False) 2022-05-18T05:29:48.9866904Z (linear2): Linear(in_features=8, out_features=16, bias=True) 2022-05-18T05:29:48.9867435Z (norm1): LayerNorm((16,), eps=1e-05, elementwise_affine=True) 2022-05-18T05:29:48.9868125Z (norm2): LayerNorm((16,), eps=1e-05, elementwise_affine=True) 2022-05-18T05:29:48.9868951Z (dropout1): Dropout(p=0.1, inplace=False) 2022-05-18T05:29:48.9869445Z (dropout2): Dropout(p=0.1, inplace=False) 2022-05-18T05:29:48.9869709Z ) 2022-05-18T05:29:48.9869995Z (1): TransformerEncoderLayer( 2022-05-18T05:29:48.9870320Z (self_attn): MultiheadAttention( 2022-05-18T05:29:48.9870746Z (out_proj): NonDynamicallyQuantizableLinear(in_features=16, out_features=16, bias=True) 2022-05-18T05:29:48.9871116Z ) 2022-05-18T05:29:48.9871430Z (linear1): Linear(in_features=16, out_features=8, bias=True) 2022-05-18T05:29:48.9871789Z (dropout): Dropout(p=0.1, inplace=False) 2022-05-18T05:29:48.9872133Z (linear2): Linear(in_features=8, out_features=16, bias=True) 2022-05-18T05:29:48.9872616Z (norm1): LayerNorm((16,), eps=1e-05, elementwise_affine=True) 2022-05-18T05:29:48.9873466Z (norm2): LayerNorm((16,), eps=1e-05, elementwise_affine=True) 2022-05-18T05:29:48.9874090Z (dropout1): Dropout(p=0.1, inplace=False) 2022-05-18T05:29:48.9874709Z (dropout2): Dropout(p=0.1, inplace=False) 2022-05-18T05:29:48.9875203Z ) 2022-05-18T05:29:48.9875583Z ) 2022-05-18T05:29:48.9876208Z (norm): LayerNorm((16,), eps=1e-05, elementwise_affine=True) 2022-05-18T05:29:48.9876519Z ) 2022-05-18T05:29:48.9876795Z (decoder): TransformerDecoder( 2022-05-18T05:29:48.9877069Z (layers): ModuleList( 2022-05-18T05:29:48.9877368Z (0): TransformerDecoderLayer( 2022-05-18T05:29:48.9877698Z (self_attn): MultiheadAttention( 2022-05-18T05:29:48.9878149Z (out_proj): NonDynamicallyQuantizableLinear(in_features=16, out_features=16, bias=True) 2022-05-18T05:29:48.9878493Z ) 2022-05-18T05:29:48.9878783Z (multihead_attn): MultiheadAttention( 2022-05-18T05:29:48.9879213Z (out_proj): NonDynamicallyQuantizableLinear(in_features=16, out_features=16, bias=True) 2022-05-18T05:29:48.9879759Z ) 2022-05-18T05:29:48.9880070Z (linear1): Linear(in_features=16, out_features=8, bias=True) 2022-05-18T05:29:48.9880433Z (dropout): Dropout(p=0.1, inplace=False) 2022-05-18T05:29:48.9880799Z (linear2): Linear(in_features=8, out_features=16, bias=True) 2022-05-18T05:29:48.9881338Z (norm1): LayerNorm((16,), eps=1e-05, elementwise_affine=True) 2022-05-18T05:29:48.9881820Z (norm2): LayerNorm((16,), eps=1e-05, elementwise_affine=True) 2022-05-18T05:29:48.9882275Z (norm3): LayerNorm((16,), eps=1e-05, elementwise_affine=True) 2022-05-18T05:29:48.9882608Z (dropout1): Dropout(p=0.1, inplace=False) 2022-05-18T05:29:48.9882947Z (dropout2): Dropout(p=0.1, inplace=False) 2022-05-18T05:29:48.9883286Z (dropout3): Dropout(p=0.1, inplace=False) 2022-05-18T05:29:48.9883545Z ) 2022-05-18T05:29:48.9883826Z (1): TransformerDecoderLayer( 2022-05-18T05:29:48.9884146Z (self_attn): MultiheadAttention( 2022-05-18T05:29:48.9884546Z (out_proj): NonDynamicallyQuantizableLinear(in_features=16, out_features=16, bias=True) 2022-05-18T05:29:48.9884906Z ) 2022-05-18T05:29:48.9885190Z (multihead_attn): MultiheadAttention( 2022-05-18T05:29:48.9885615Z (out_proj): NonDynamicallyQuantizableLinear(in_features=16, out_features=16, bias=True) 2022-05-18T05:29:48.9885954Z ) 2022-05-18T05:29:48.9886259Z (linear1): Linear(in_features=16, out_features=8, bias=True) 2022-05-18T05:29:48.9886615Z (dropout): Dropout(p=0.1, inplace=False) 2022-05-18T05:29:48.9886955Z (linear2): Linear(in_features=8, out_features=16, bias=True) 2022-05-18T05:29:48.9887414Z (norm1): LayerNorm((16,), eps=1e-05, elementwise_affine=True) 2022-05-18T05:29:48.9887875Z (norm2): LayerNorm((16,), eps=1e-05, elementwise_affine=True) 2022-05-18T05:29:48.9888315Z (norm3): LayerNorm((16,), eps=1e-05, elementwise_affine=True) 2022-05-18T05:29:48.9888667Z (dropout1): Dropout(p=0.1, inplace=False) 2022-05-18T05:29:48.9889004Z (dropout2): Dropout(p=0.1, inplace=False) 2022-05-18T05:29:48.9889337Z (dropout3): Dropout(p=0.1, inplace=False) 2022-05-18T05:29:48.9889601Z ) 2022-05-18T05:29:48.9889826Z ) 2022-05-18T05:29:48.9890207Z (norm): LayerNorm((16,), eps=1e-05, elementwise_affine=True) 2022-05-18T05:29:48.9890837Z ) 2022-05-18T05:29:48.9891064Z ) 2022-05-18T05:29:48.9891371Z (output_proj): Linear(in_features=16, out_features=23, bias=True) 2022-05-18T05:29:48.9891866Z (bn): BatchNorm1d(2, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) 2022-05-18T05:29:48.9892195Z ) 2022-05-18T05:29:48.9892411Z ) 2022-05-18T05:29:48.9892609Z ) 2022-05-18T05:29:48.9892985Z ERROR: expected to be in states [] but current state is TrainingState_.SUMMON_FULL_PARAMS 2022-05-18T05:29:48.9893383Z File "", line 1, in 2022-05-18T05:29:48.9893743Z File "/opt/conda/lib/python3.7/multiprocessing/spawn.py", line 105, in spawn_main 2022-05-18T05:29:48.9894090Z exitcode = _main(fd) 2022-05-18T05:29:48.9894444Z File "/opt/conda/lib/python3.7/multiprocessing/spawn.py", line 118, in _main 2022-05-18T05:29:48.9894792Z return self._bootstrap() 2022-05-18T05:29:48.9895146Z File "/opt/conda/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap 2022-05-18T05:29:48.9895480Z self.run() 2022-05-18T05:29:48.9895815Z File "/opt/conda/lib/python3.7/multiprocessing/process.py", line 99, in run 2022-05-18T05:29:48.9896165Z self._target(*self._args, **self._kwargs) 2022-05-18T05:29:48.9896690Z File "/opt/conda/lib/python3.7/site-packages/torch/testing/_internal/common_fsdp.py", line 429, in _run 2022-05-18T05:29:48.9897085Z self.run_test(test_name, pipe) 2022-05-18T05:29:48.9897707Z File "/opt/conda/lib/python3.7/site-packages/torch/testing/_internal/common_distributed.py", line 618, in run_test 2022-05-18T05:29:48.9898112Z getattr(self, test_name)() 2022-05-18T05:29:48.9898628Z File "/opt/conda/lib/python3.7/site-packages/torch/testing/_internal/common_distributed.py", line 499, in wrapper 2022-05-18T05:29:48.9898999Z fn() 2022-05-18T05:29:48.9899539Z File "/opt/conda/lib/python3.7/site-packages/torch/testing/_internal/common_distributed.py", line 141, in wrapper 2022-05-18T05:29:48.9899948Z return func(*args, **kwargs) 2022-05-18T05:29:48.9900376Z File "/var/lib/jenkins/workspace/test/distributed/fsdp/test_fsdp_apply.py", line 100, in test_apply_in_summon_raises_error 2022-05-18T05:29:48.9900806Z transformer.apply(self._init_linear_weights) 2022-05-18T05:29:48.9901377Z File "/opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 1104, in apply 2022-05-18T05:29:48.9901822Z self._assert_state(TrainingState_.IDLE) 2022-05-18T05:29:48.9902396Z File "/opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 3298, in _assert_state 2022-05-18T05:29:48.9902790Z traceback.print_stack() 2022-05-18T05:29:49.2636623Z ok (4.316s) 2022-05-18T05:29:49.2644364Z test_nested_module_apply (__main__.TestApply) 2022-05-18T05:29:49.2770655Z Checks apply() modifies weights appropriately on a nested FSDP instance. ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 114900 2022-05-18T05:29:49.2881458Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 114901 2022-05-18T05:29:50.2272958Z dist init r=0, world=2 2022-05-18T05:29:50.2276335Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:29:50.2347934Z dist init r=1, world=2 2022-05-18T05:29:50.2353241Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:29:50.2354714Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:29:50.2379752Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:29:51.6115762Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:29:51.6116776Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:29:51.6324993Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:29:51.6326501Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:29:51.6328189Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:29:51.6329456Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:29:51.9956147Z ok (2.732s) 2022-05-18T05:29:51.9962177Z test_transformer_module_apply (__main__.TestApply) 2022-05-18T05:29:52.0088899Z Checks apply() modifies weights appropriately on a wrapped Transformer ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 114983 2022-05-18T05:29:52.0198564Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 114984 2022-05-18T05:29:52.8793049Z dist init r=1, world=2 2022-05-18T05:29:52.8796441Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:29:52.9240923Z dist init r=0, world=2 2022-05-18T05:29:52.9245616Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:29:52.9246996Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:29:52.9306484Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:29:54.2978276Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:29:54.2979294Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:29:54.3305288Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:29:54.3306569Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:29:54.3308214Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:29:54.3309424Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:29:55.0288205Z ok (3.033s) 2022-05-18T05:29:55.0288423Z 2022-05-18T05:29:55.0288849Z ---------------------------------------------------------------------- 2022-05-18T05:29:55.0289217Z Ran 3 tests in 10.082s 2022-05-18T05:29:55.0289390Z 2022-05-18T05:29:55.0289469Z OK 2022-05-18T05:29:55.0289606Z 2022-05-18T05:29:55.0289741Z Generating XML reports... 2022-05-18T05:29:55.0340202Z Generated XML report: test-reports/python-unittest/distributed.fsdp.test_fsdp_apply/TEST-TestApply-20220518052944.xml 2022-05-18T05:29:55.2996601Z Running distributed/_shard/test_partial_tensor ... [2022-05-18 05:29:55.299168] 2022-05-18T05:29:55.2997358Z Executing ['/opt/conda/bin/python', 'distributed/_shard/test_partial_tensor.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2022-05-18 05:29:55.299263] 2022-05-18T05:29:56.2231580Z Test results will be stored in test-reports/python-unittest/distributed._shard.test_partial_tensor 2022-05-18T05:29:56.2249214Z 2022-05-18T05:29:56.2249361Z Running tests... 2022-05-18T05:29:56.2250350Z ---------------------------------------------------------------------- 2022-05-18T05:29:57.8767963Z test_cat (__main__.TestPartialTensorOps) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:29:57.9135416Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 115103 2022-05-18T05:29:57.9251752Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 115104 2022-05-18T05:29:57.9368936Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 115105 2022-05-18T05:29:57.9487369Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 115106 2022-05-18T05:29:58.8480650Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:29:58.8567665Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:29:58.8594723Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:29:58.9199899Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:29:59.1536291Z skip: Need at least 4 CUDA devices (2.928s) 2022-05-18T05:29:59.1676181Z test_cat_errors (__main__.TestPartialTensorOps) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 115247 2022-05-18T05:29:59.1792399Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 115248 2022-05-18T05:29:59.1905464Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 115249 2022-05-18T05:29:59.2019566Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 115250 2022-05-18T05:30:00.1116222Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:30:00.1179784Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:30:00.1326979Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:30:00.1592003Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:30:00.3060173Z skip: Need at least 4 CUDA devices (1.152s) 2022-05-18T05:30:00.3193081Z test_transpose (__main__.TestPartialTensorOps) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 115391 2022-05-18T05:30:00.3302539Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 115392 2022-05-18T05:30:00.3416895Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 115393 2022-05-18T05:30:00.3528887Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 115394 2022-05-18T05:30:01.2503501Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:30:01.2533251Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:30:01.2560549Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:30:01.2597368Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:30:01.4570174Z skip: Need at least 4 CUDA devices (1.151s) 2022-05-18T05:30:01.4706931Z test_partial_tensor_reshard (__main__.TestPartialTensorReshard) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 115535 2022-05-18T05:30:01.4818532Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 115536 2022-05-18T05:30:01.4934527Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 115537 2022-05-18T05:30:01.5046049Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 115538 2022-05-18T05:30:02.4057908Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:30:02.4447869Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:30:02.4567135Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:30:02.4688809Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:30:02.7089424Z skip: Need at least 4 CUDA devices (1.252s) 2022-05-18T05:30:02.7242032Z test_partial_tensor_reshard_errors (__main__.TestPartialTensorReshard) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 115679 2022-05-18T05:30:02.7361091Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 115680 2022-05-18T05:30:02.7479698Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 115681 2022-05-18T05:30:02.7601435Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 115682 2022-05-18T05:30:03.6519404Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:30:03.6640958Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:30:03.6720024Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:30:03.7238579Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:30:03.9646967Z skip: Need at least 4 CUDA devices (1.256s) 2022-05-18T05:30:03.9647223Z 2022-05-18T05:30:03.9647640Z ---------------------------------------------------------------------- 2022-05-18T05:30:03.9647971Z Ran 5 tests in 7.740s 2022-05-18T05:30:03.9648140Z 2022-05-18T05:30:03.9648333Z OK (skipped=5) 2022-05-18T05:30:03.9648606Z 2022-05-18T05:30:03.9648740Z Generating XML reports... 2022-05-18T05:30:03.9709874Z Generated XML report: test-reports/python-unittest/distributed._shard.test_partial_tensor/TEST-TestPartialTensorOps-20220518052956.xml 2022-05-18T05:30:03.9715583Z Generated XML report: test-reports/python-unittest/distributed._shard.test_partial_tensor/TEST-TestPartialTensorReshard-20220518052956.xml 2022-05-18T05:30:04.2538813Z Running distributed/fsdp/test_distributed_checkpoint ... [2022-05-18 05:30:04.253348] 2022-05-18T05:30:04.2540477Z Executing ['/opt/conda/bin/python', 'distributed/fsdp/test_distributed_checkpoint.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2022-05-18 05:30:04.253445] 2022-05-18T05:30:05.1868842Z Test results will be stored in test-reports/python-unittest/distributed.fsdp.test_distributed_checkpoint 2022-05-18T05:30:05.1886096Z 2022-05-18T05:30:05.1886615Z Running tests... 2022-05-18T05:30:05.1887125Z ---------------------------------------------------------------------- 2022-05-18T05:30:06.8433818Z test_distributed_checkpoint_state_dict_type_StateDictType_LOCAL_STATE_DICT (__main__.TestDistributedCheckpoint) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:30:06.8798153Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 115860 2022-05-18T05:30:06.8912792Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 115861 2022-05-18T05:30:07.7879237Z dist init r=0, world=2 2022-05-18T05:30:07.7882555Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:30:07.8016306Z dist init r=1, world=2 2022-05-18T05:30:07.8021149Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:30:07.8022113Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:30:07.8087653Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:30:09.1954575Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:30:09.1955101Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:30:09.5989896Z ok (4.410s) 2022-05-18T05:30:09.6135253Z test_distributed_checkpoint_state_dict_type_StateDictType_SHARDED_STATE_DICT (__main__.TestDistributedCheckpoint) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 115943 2022-05-18T05:30:09.6244357Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 115944 2022-05-18T05:30:10.5412364Z dist init r=1, world=2 2022-05-18T05:30:10.5415812Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:30:10.5447540Z dist init r=0, world=2 2022-05-18T05:30:10.5452400Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:30:10.5454191Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:30:10.5518909Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:30:11.9225180Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:30:11.9225729Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:30:12.2315222Z ok (2.632s) 2022-05-18T05:30:12.2315449Z 2022-05-18T05:30:12.2315859Z ---------------------------------------------------------------------- 2022-05-18T05:30:12.2316213Z Ran 2 tests in 7.043s 2022-05-18T05:30:12.2316389Z 2022-05-18T05:30:12.2316486Z OK 2022-05-18T05:30:12.2316608Z 2022-05-18T05:30:12.2316743Z Generating XML reports... 2022-05-18T05:30:12.2363457Z Generated XML report: test-reports/python-unittest/distributed.fsdp.test_distributed_checkpoint/TEST-TestDistributedCheckpoint-20220518053005.xml 2022-05-18T05:30:12.5025080Z Running distributed/_shard/sharded_tensor/ops/test_binary_cmp ... [2022-05-18 05:30:12.502032] 2022-05-18T05:30:12.5025896Z Executing ['/opt/conda/bin/python', 'distributed/_shard/sharded_tensor/ops/test_binary_cmp.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2022-05-18 05:30:12.502126] 2022-05-18T05:30:13.3943828Z Test results will be stored in test-reports/python-unittest/distributed._shard.sharded_tensor.ops.test_binary_cmp 2022-05-18T05:30:13.3959352Z 2022-05-18T05:30:13.3959809Z Running tests... 2022-05-18T05:30:13.3960323Z ---------------------------------------------------------------------- 2022-05-18T05:30:13.3971579Z test_torch_allclose (__main__.TestShardedTensorBinaryOps) 2022-05-18T05:30:15.0150206Z Test torch.allclose(ShardedTensor, ShardedTensor) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:30:15.0511726Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 116063 2022-05-18T05:30:15.0625305Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 116064 2022-05-18T05:30:15.0740559Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 116065 2022-05-18T05:30:15.0856716Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 116066 2022-05-18T05:30:15.9865105Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:30:15.9865769Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:30:16.0505502Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:30:16.0537620Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:30:16.2905728Z skip: Need at least 4 CUDA devices (2.894s) 2022-05-18T05:30:16.3040798Z test_torch_allclose_tensor_specs (__main__.TestShardedTensorBinaryOps) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 116207 2022-05-18T05:30:16.3154268Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 116208 2022-05-18T05:30:16.3269579Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 116209 2022-05-18T05:30:16.3385823Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 116210 2022-05-18T05:30:17.2918466Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:30:17.3102994Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:30:17.3585213Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:30:17.3608315Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:30:17.5431192Z skip: Need at least 4 CUDA devices (1.252s) 2022-05-18T05:30:17.5439108Z test_torch_equal (__main__.TestShardedTensorBinaryOps) 2022-05-18T05:30:17.5568936Z Test torch.equal(ShardedTensor, ShardedTensor) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 116351 2022-05-18T05:30:17.5682165Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 116352 2022-05-18T05:30:17.5796990Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 116353 2022-05-18T05:30:17.5911909Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 116354 2022-05-18T05:30:18.4975106Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:30:18.5010670Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:30:18.5600931Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:30:18.5601485Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:30:18.7954281Z skip: Need at least 4 CUDA devices (1.252s) 2022-05-18T05:30:18.8087417Z test_torch_equal_tensor_specs (__main__.TestShardedTensorBinaryOps) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 116495 2022-05-18T05:30:18.8200144Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 116496 2022-05-18T05:30:18.8315967Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 116497 2022-05-18T05:30:18.8430865Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 116498 2022-05-18T05:30:19.7644004Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:30:19.7840637Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:30:19.7841302Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:30:19.8039002Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:30:20.0475959Z skip: Need at least 4 CUDA devices (1.252s) 2022-05-18T05:30:20.0476235Z 2022-05-18T05:30:20.0476643Z ---------------------------------------------------------------------- 2022-05-18T05:30:20.0476992Z Ran 4 tests in 6.652s 2022-05-18T05:30:20.0477141Z 2022-05-18T05:30:20.0477254Z OK (skipped=4) 2022-05-18T05:30:20.0477411Z 2022-05-18T05:30:20.0477539Z Generating XML reports... 2022-05-18T05:30:20.0540448Z Generated XML report: test-reports/python-unittest/distributed._shard.sharded_tensor.ops.test_binary_cmp/TEST-TestShardedTensorBinaryOps-20220518053013.xml 2022-05-18T05:30:20.3287009Z Running distributed/_shard/sharded_tensor/ops/test_elementwise_ops ... [2022-05-18 05:30:20.328200] 2022-05-18T05:30:20.3287846Z Executing ['/opt/conda/bin/python', 'distributed/_shard/sharded_tensor/ops/test_elementwise_ops.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2022-05-18 05:30:20.328296] 2022-05-18T05:30:21.2214364Z Test results will be stored in test-reports/python-unittest/distributed._shard.sharded_tensor.ops.test_elementwise_ops 2022-05-18T05:30:21.2229373Z 2022-05-18T05:30:21.2229648Z Running tests... 2022-05-18T05:30:21.2230081Z ---------------------------------------------------------------------- 2022-05-18T05:30:22.8398049Z test_sharded_dropout (__main__.TestShardedTensorElementWiseOps) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:30:22.8759489Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 116676 2022-05-18T05:30:22.8874860Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 116677 2022-05-18T05:30:22.8990879Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 116678 2022-05-18T05:30:22.9107656Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 116679 2022-05-18T05:30:23.8100050Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:30:23.8540911Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:30:23.8632956Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:30:23.8814504Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:30:24.1155319Z skip: Need at least 4 CUDA devices (2.892s) 2022-05-18T05:30:24.1295166Z test_sharded_gelu (__main__.TestShardedTensorElementWiseOps) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 116820 2022-05-18T05:30:24.1407301Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 116821 2022-05-18T05:30:24.1525332Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 116822 2022-05-18T05:30:24.1642205Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 116823 2022-05-18T05:30:25.0632719Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:30:25.0992604Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:30:25.1483099Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:30:25.1573039Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:30:25.3685353Z skip: Need at least 4 CUDA devices (1.253s) 2022-05-18T05:30:25.3828014Z test_sharded_relu (__main__.TestShardedTensorElementWiseOps) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 116964 2022-05-18T05:30:25.3936755Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 116965 2022-05-18T05:30:25.4051431Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 116966 2022-05-18T05:30:25.4164784Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 116967 2022-05-18T05:30:26.2934033Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:30:26.3076756Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:30:26.3105774Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:30:26.3466997Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:30:26.5207249Z skip: Need at least 4 CUDA devices (1.152s) 2022-05-18T05:30:26.5207593Z 2022-05-18T05:30:26.5208133Z ---------------------------------------------------------------------- 2022-05-18T05:30:26.5208486Z Ran 3 tests in 5.298s 2022-05-18T05:30:26.5208636Z 2022-05-18T05:30:26.5208750Z OK (skipped=3) 2022-05-18T05:30:26.5208903Z 2022-05-18T05:30:26.5209031Z Generating XML reports... 2022-05-18T05:30:26.5269088Z Generated XML report: test-reports/python-unittest/distributed._shard.sharded_tensor.ops.test_elementwise_ops/TEST-TestShardedTensorElementWiseOps-20220518053021.xml 2022-05-18T05:30:26.7961499Z Running distributed/elastic/timer/local_timer_test ... [2022-05-18 05:30:26.795642] 2022-05-18T05:30:26.7962335Z Executing ['/opt/conda/bin/python', 'distributed/elastic/timer/local_timer_test.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2022-05-18 05:30:26.795737] 2022-05-18T05:30:27.6918651Z Test results will be stored in test-reports/python-unittest/distributed.elastic.timer.local_timer_test 2022-05-18T05:30:27.6939364Z 2022-05-18T05:30:27.6939831Z Running tests... 2022-05-18T05:30:27.6940353Z ---------------------------------------------------------------------- 2022-05-18T05:30:27.6951741Z test_acquire_release (__main__.LocalTimerServerTest) 2022-05-18T05:30:29.3859041Z tests that: ... ok (1.692s) 2022-05-18T05:30:29.3868727Z test_expired_timers (__main__.LocalTimerServerTest) 2022-05-18T05:30:29.3888254Z tests that a single expired timer on a process should terminate ... ok (0.003s) 2022-05-18T05:30:29.3904010Z test_valid_timers (__main__.LocalTimerServerTest) 2022-05-18T05:30:29.3922551Z tests that valid timers are processed correctly and the process is left alone ... ok (0.003s) 2022-05-18T05:30:29.3932583Z test_watchdog_call_count (__main__.LocalTimerServerTest) 2022-05-18T05:30:29.4971862Z checks that the watchdog function ran wait/interval +- 1 times ... ok (0.105s) 2022-05-18T05:30:29.4976391Z test_watchdog_empty_queue (__main__.LocalTimerServerTest) 2022-05-18T05:30:29.5086603Z checks that the watchdog can run on an empty queue ... ok (0.011s) 2022-05-18T05:30:29.5278167Z test_client_interaction (__main__.LocalTimerTest) ... ok (0.019s) 2022-05-18T05:30:29.5404986Z test_exception_propagation (__main__.LocalTimerTest) ... ok (0.012s) 2022-05-18T05:30:29.5418825Z test_get_timer_recursive (__main__.LocalTimerTest) 2022-05-18T05:30:30.9138982Z If a function acquires a countdown timer with default scope, ... ok (1.373s) 2022-05-18T05:30:31.0181574Z test_happy_path (__main__.LocalTimerTest) ... ok (0.104s) 2022-05-18T05:30:31.0298928Z test_no_client (__main__.LocalTimerTest) ... ok (0.012s) 2022-05-18T05:30:31.1815506Z test_timer (__main__.LocalTimerTest) ... ok (0.151s) 2022-05-18T05:30:31.2048247Z test_get (__main__.MultiprocessingRequestQueueTest) ... ok (0.023s) 2022-05-18T05:30:31.2057516Z test_get_less_than_size (__main__.MultiprocessingRequestQueueTest) 2022-05-18T05:30:31.7190908Z Tests slow producer. ... ok (0.514s) 2022-05-18T05:30:31.7212046Z test_get_size (__main__.MultiprocessingRequestQueueTest) 2022-05-18T05:30:32.6386529Z Creates a "producer" process that enqueues ``n`` elements ... ok (0.919s) 2022-05-18T05:30:32.6391316Z 2022-05-18T05:30:32.6391991Z ---------------------------------------------------------------------- 2022-05-18T05:30:32.6392388Z Ran 14 tests in 4.945s 2022-05-18T05:30:32.6392582Z 2022-05-18T05:30:32.6392692Z OK 2022-05-18T05:30:32.6392815Z 2022-05-18T05:30:32.6392959Z Generating XML reports... 2022-05-18T05:30:32.6465346Z Generated XML report: test-reports/python-unittest/distributed.elastic.timer.local_timer_test/TEST-LocalTimerServerTest-20220518053027.xml 2022-05-18T05:30:32.6476335Z Generated XML report: test-reports/python-unittest/distributed.elastic.timer.local_timer_test/TEST-LocalTimerTest-20220518053027.xml 2022-05-18T05:30:32.6484254Z Generated XML report: test-reports/python-unittest/distributed.elastic.timer.local_timer_test/TEST-MultiprocessingRequestQueueTest-20220518053027.xml 2022-05-18T05:30:33.0334391Z Running distributed/test_data_parallel ... [2022-05-18 05:30:33.032946] 2022-05-18T05:30:33.0335131Z Executing ['/opt/conda/bin/python', 'distributed/test_data_parallel.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2022-05-18 05:30:33.033080] 2022-05-18T05:30:35.6316314Z Test results will be stored in test-reports/python-unittest/distributed.test_data_parallel 2022-05-18T05:30:35.6340185Z 2022-05-18T05:30:35.6340483Z Running tests... 2022-05-18T05:30:35.6340913Z ---------------------------------------------------------------------- 2022-05-18T05:30:37.1334212Z test_autocast (__main__.TestDataParallel) ... ok (1.499s) 2022-05-18T05:30:37.2355860Z test_data_parallel (__main__.TestDataParallel) ... ok (0.102s) 2022-05-18T05:30:37.2481159Z test_data_parallel_buffers_requiring_grad (__main__.TestDataParallel) ... ok (0.013s) 2022-05-18T05:30:37.2513710Z test_data_parallel_complex (__main__.TestDataParallel) ... ok (0.003s) 2022-05-18T05:30:37.2575066Z test_data_parallel_device_args (__main__.TestDataParallel) ... ok (0.006s) 2022-05-18T05:30:37.2635932Z test_data_parallel_function_deletion (__main__.TestDataParallel) ... ok (0.006s) 2022-05-18T05:30:37.2650611Z test_data_parallel_lazy_linear (__main__.TestDataParallel) ... /opt/conda/lib/python3.7/site-packages/torch/nn/modules/lazy.py:178: UserWarning: Lazy modules are a new feature under heavy development so changes to the API or functionality can happen at any moment. 2022-05-18T05:30:37.2651867Z warnings.warn('Lazy modules are a new feature under heavy development ' 2022-05-18T05:30:37.2660562Z ok (0.002s) 2022-05-18T05:30:37.2705050Z test_data_parallel_model_device (__main__.TestDataParallel) 2022-05-18T05:30:37.3009731Z Test device[0] check at forward time. ... ok (0.035s) 2022-05-18T05:30:37.3510718Z test_data_parallel_model_no_refcycles (__main__.TestDataParallel) ... ok (0.050s) 2022-05-18T05:30:37.3562040Z test_data_parallel_module_zero_inputs (__main__.TestDataParallel) ... ok (0.005s) 2022-05-18T05:30:37.3625197Z test_data_parallel_multiple_input (__main__.TestDataParallel) ... /opt/conda/lib/python3.7/site-packages/torch/nn/parallel/comm.py:232: UserWarning: Using -1 to represent CPU tensor is deprecated. Please use a device object or string instead, e.g., "cpu". 2022-05-18T05:30:37.3625930Z 'Using -1 to represent CPU tensor is deprecated. Please use a ' 2022-05-18T05:30:37.3789750Z ok (0.023s) 2022-05-18T05:30:37.3820658Z test_data_parallel_nested_input (__main__.TestDataParallel) ... ok (0.003s) 2022-05-18T05:30:37.3885545Z test_data_parallel_nested_output (__main__.TestDataParallel) ... ok (0.006s) 2022-05-18T05:30:37.3928592Z test_data_parallel_no_grad (__main__.TestDataParallel) ... ok (0.004s) 2022-05-18T05:30:37.9655133Z test_data_parallel_rnn (__main__.TestDataParallel) ... Could not load symbol cublasGetSmCountTarget from libcublas.so.11. Error: /usr/local/cuda/lib64/libcublas.so.11: undefined symbol: cublasGetSmCountTarget 2022-05-18T05:30:38.4175644Z ok (1.024s) 2022-05-18T05:30:38.4209263Z test_data_parallel_small_back (__main__.TestDataParallel) ... ok (0.003s) 2022-05-18T05:30:38.4330939Z test_data_parallel_sparse (__main__.TestDataParallel) ... ok (0.012s) 2022-05-18T05:30:38.4568220Z test_gather_cpu (__main__.TestDataParallel) ... /opt/conda/lib/python3.7/site-packages/torch/nn/parallel/_functions.py:68: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector. 2022-05-18T05:30:38.4568954Z warnings.warn('Was asked to gather along dimension 0, but all ' 2022-05-18T05:30:38.4796453Z ok (0.046s) 2022-05-18T05:30:38.4809111Z test_gather_different_len_dicts (__main__.TestDataParallel) ... ok (0.001s) 2022-05-18T05:30:38.5267180Z test_gather_gpu (__main__.TestDataParallel) ... ok (0.046s) 2022-05-18T05:30:38.5321548Z test_parallel_apply (__main__.TestDataParallel) ... ok (0.005s) 2022-05-18T05:30:38.5380183Z test_parallel_apply_autocast (__main__.TestDataParallel) ... ok (0.006s) 2022-05-18T05:30:38.5401880Z test_parallel_apply_passes_exception (__main__.TestDataParallel) ... ok (0.002s) 2022-05-18T05:30:38.5481869Z test_parameter_list_dict_replica (__main__.TestDataParallel) ... ok (0.008s) 2022-05-18T05:30:38.5528095Z test_replicate (__main__.TestDataParallel) ... ok (0.005s) 2022-05-18T05:30:38.5565752Z test_replicate_buffers (__main__.TestDataParallel) ... ok (0.004s) 2022-05-18T05:30:38.5600860Z test_save_replica_module (__main__.TestDataParallel) ... ok (0.003s) 2022-05-18T05:30:38.5792658Z test_scatter_cpu (__main__.TestDataParallel) ... ok (0.019s) 2022-05-18T05:30:38.5989190Z test_scatter_gpu (__main__.TestDataParallel) ... ok (0.020s) 2022-05-18T05:30:39.9154527Z test_strided_grad_layout (__main__.TestDataParallel) ... ok (1.316s) 2022-05-18T05:30:39.9210548Z test_zero_grad (__main__.TestDataParallel) ... ok (0.006s) 2022-05-18T05:30:39.9270543Z test_data_parallel_module_cuda_float16 (__main__.TestDataParallelDeviceTypeCUDA) ... ok (0.006s) 2022-05-18T05:30:39.9325144Z test_data_parallel_module_cuda_float32 (__main__.TestDataParallelDeviceTypeCUDA) ... ok (0.005s) 2022-05-18T05:30:39.9376816Z test_data_parallel_module_cuda_float64 (__main__.TestDataParallelDeviceTypeCUDA) ... ok (0.005s) 2022-05-18T05:30:40.0683280Z test_data_parallel_module_kwargs_only_cuda_float16 (__main__.TestDataParallelDeviceTypeCUDA) ... ok (0.130s) 2022-05-18T05:30:40.0974212Z test_data_parallel_module_kwargs_only_cuda_float32 (__main__.TestDataParallelDeviceTypeCUDA) ... ok (0.029s) 2022-05-18T05:30:40.1254338Z test_data_parallel_module_kwargs_only_cuda_float64 (__main__.TestDataParallelDeviceTypeCUDA) ... ok (0.028s) 2022-05-18T05:30:40.1537596Z test_data_parallel_module_kwargs_only_empty_dict_cuda_float16 (__main__.TestDataParallelDeviceTypeCUDA) ... ok (0.028s) 2022-05-18T05:30:40.1817264Z test_data_parallel_module_kwargs_only_empty_dict_cuda_float32 (__main__.TestDataParallelDeviceTypeCUDA) ... ok (0.028s) 2022-05-18T05:30:40.2100555Z test_data_parallel_module_kwargs_only_empty_dict_cuda_float64 (__main__.TestDataParallelDeviceTypeCUDA) ... ok (0.028s) 2022-05-18T05:30:40.2388597Z test_data_parallel_module_kwargs_only_empty_list_cuda_float16 (__main__.TestDataParallelDeviceTypeCUDA) ... ok (0.029s) 2022-05-18T05:30:40.2404123Z test_data_parallel_module_kwargs_only_empty_list_cuda_float32 (__main__.TestDataParallelDeviceTypeCUDA) ... skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/73923 for allplatform(s) . If you're seeing this on your local machine and would like to enable this test, please make sure IN_CI is not set and you are not using the flag --import-disabled-tests. (0.001s) 2022-05-18T05:30:40.2689635Z test_data_parallel_module_kwargs_only_empty_list_cuda_float64 (__main__.TestDataParallelDeviceTypeCUDA) ... ok (0.028s) 2022-05-18T05:30:40.2971607Z test_data_parallel_module_kwargs_only_empty_tuple_cuda_float16 (__main__.TestDataParallelDeviceTypeCUDA) ... ok (0.028s) 2022-05-18T05:30:40.3252049Z test_data_parallel_module_kwargs_only_empty_tuple_cuda_float32 (__main__.TestDataParallelDeviceTypeCUDA) ... ok (0.028s) 2022-05-18T05:30:40.3532971Z test_data_parallel_module_kwargs_only_empty_tuple_cuda_float64 (__main__.TestDataParallelDeviceTypeCUDA) ... ok (0.028s) 2022-05-18T05:30:40.3533383Z 2022-05-18T05:30:40.3533745Z ---------------------------------------------------------------------- 2022-05-18T05:30:40.3534092Z Ran 46 tests in 4.719s 2022-05-18T05:30:40.3534258Z 2022-05-18T05:30:40.3534370Z OK (skipped=1) 2022-05-18T05:30:40.3534527Z 2022-05-18T05:30:40.3534654Z Generating XML reports... 2022-05-18T05:30:40.3601591Z Generated XML report: test-reports/python-unittest/distributed.test_data_parallel/TEST-TestDataParallel-20220518053035.xml 2022-05-18T05:30:40.3621085Z Generated XML report: test-reports/python-unittest/distributed.test_data_parallel/TEST-TestDataParallelDeviceTypeCUDA-20220518053035.xml 2022-05-18T05:30:41.0020403Z Running distributed/fsdp/test_fsdp_multiple_wrapping ... [2022-05-18 05:30:41.001539] 2022-05-18T05:30:41.0021204Z Executing ['/opt/conda/bin/python', 'distributed/fsdp/test_fsdp_multiple_wrapping.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2022-05-18 05:30:41.001642] 2022-05-18T05:30:41.9426279Z Test results will be stored in test-reports/python-unittest/distributed.fsdp.test_fsdp_multiple_wrapping 2022-05-18T05:30:41.9442555Z 2022-05-18T05:30:41.9442832Z Running tests... 2022-05-18T05:30:41.9443292Z ---------------------------------------------------------------------- 2022-05-18T05:30:41.9463526Z test_multiple_wrapping (__main__.TestMultipleWrapping) 2022-05-18T05:30:43.6183635Z This test simulates wrapping the module after training to run inference. ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:30:43.6549506Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 117450 2022-05-18T05:30:43.6664483Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 117451 2022-05-18T05:30:44.5729195Z dist init r=1, world=2 2022-05-18T05:30:44.5732300Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:30:44.5760070Z dist init r=0, world=2 2022-05-18T05:30:44.5764784Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:30:44.5766007Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:30:44.5835624Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:30:45.9618810Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:30:45.9619377Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:30:45.9841978Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 0 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:30:45.9842658Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:30:45.9878395Z /opt/conda/lib/python3.7/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:912: UserWarning: Module is input on CPU, we are moving it to 1 to perform parameter verification, flattening, sharding, and will move it back after. 2022-05-18T05:30:45.9879039Z f"Module is input on CPU, we are moving it to {torch.cuda.current_device()}" 2022-05-18T05:30:46.5746462Z ok (4.630s) 2022-05-18T05:30:46.5746694Z 2022-05-18T05:30:46.5747089Z ---------------------------------------------------------------------- 2022-05-18T05:30:46.5747791Z Ran 1 test in 4.630s 2022-05-18T05:30:46.5747972Z 2022-05-18T05:30:46.5748070Z OK 2022-05-18T05:30:46.5748211Z 2022-05-18T05:30:46.5748351Z Generating XML reports... 2022-05-18T05:30:46.5791317Z Generated XML report: test-reports/python-unittest/distributed.fsdp.test_fsdp_multiple_wrapping/TEST-TestMultipleWrapping-20220518053041.xml 2022-05-18T05:30:46.8417530Z Running distributed/fsdp/test_fsdp_pure_fp16 ... [2022-05-18 05:30:46.841194] 2022-05-18T05:30:46.8418344Z Executing ['/opt/conda/bin/python', 'distributed/fsdp/test_fsdp_pure_fp16.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2022-05-18 05:30:46.841291] 2022-05-18T05:30:47.7870239Z Test results will be stored in test-reports/python-unittest/distributed.fsdp.test_fsdp_pure_fp16 2022-05-18T05:30:47.7893990Z 2022-05-18T05:30:47.7894440Z Running tests... 2022-05-18T05:30:47.7894928Z ---------------------------------------------------------------------- 2022-05-18T05:30:49.4325031Z test_pure_fp16_cpu_offload_CPUOffload(offload_params=False) (__main__.TestPureFP16) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:30:49.4450781Z skip: Test is disabled because an issue exists disabling it: https://github.com/pytorch/pytorch/issues/73315 for platform(s) linux. If you're seeing this on your local machine and would like to enable this test, please make sure IN_CI is not set and you are not using the flag --import-disabled-tests. (1.655s) 2022-05-18T05:30:49.4711157Z test_pure_fp16_cpu_offload_CPUOffload(offload_params=True) (__main__.TestPureFP16) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 117574 2022-05-18T05:30:49.4826636Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 117575 2022-05-18T05:30:50.4178633Z dist init r=0, world=2 2022-05-18T05:30:50.4181895Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 2022-05-18T05:30:50.4407477Z dist init r=1, world=2 2022-05-18T05:30:50.4412444Z INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1 2022-05-18T05:30:50.4413608Z INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:30:50.4488845Z INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes. 2022-05-18T05:30:51.8279869Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:30:51.8280398Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:30:52.4923139Z ok (3.047s) 2022-05-18T05:30:52.4923370Z 2022-05-18T05:30:52.4923774Z ---------------------------------------------------------------------- 2022-05-18T05:30:52.4924135Z Ran 2 tests in 4.703s 2022-05-18T05:30:52.4924304Z 2022-05-18T05:30:52.4924405Z OK (skipped=1) 2022-05-18T05:30:52.4924565Z 2022-05-18T05:30:52.4924697Z Generating XML reports... 2022-05-18T05:30:52.4974244Z Generated XML report: test-reports/python-unittest/distributed.fsdp.test_fsdp_pure_fp16/TEST-TestPureFP16-20220518053047.xml 2022-05-18T05:30:52.7718849Z Running distributed/_shard/sharded_tensor/ops/test_softmax ... [2022-05-18 05:30:52.771407] 2022-05-18T05:30:52.7719642Z Executing ['/opt/conda/bin/python', 'distributed/_shard/sharded_tensor/ops/test_softmax.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2022-05-18 05:30:52.771499] 2022-05-18T05:30:53.6681318Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpeyw59ca8 2022-05-18T05:30:53.6682709Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpeyw59ca8/_remote_module_non_scriptable.py 2022-05-18T05:30:53.6826584Z Test results will be stored in test-reports/python-unittest/distributed._shard.sharded_tensor.ops.test_softmax 2022-05-18T05:30:53.6843018Z 2022-05-18T05:30:53.6843296Z Running tests... 2022-05-18T05:30:53.6843738Z ---------------------------------------------------------------------- 2022-05-18T05:30:55.2979994Z test_sharded_softmax_basic (__main__.TestShardedSoftmax) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:30:55.3342738Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 117698 2022-05-18T05:30:55.3454801Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 117699 2022-05-18T05:30:55.3568956Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 117700 2022-05-18T05:30:55.3684001Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 117701 2022-05-18T05:30:56.2952305Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp_36t45_y 2022-05-18T05:30:56.2953464Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp_36t45_y/_remote_module_non_scriptable.py 2022-05-18T05:30:56.3072400Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:30:56.3435047Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpe_9k2c2t 2022-05-18T05:30:56.3437058Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpe_9k2c2t/_remote_module_non_scriptable.py 2022-05-18T05:30:56.3461447Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpjgqnd21p 2022-05-18T05:30:56.3463941Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpjgqnd21p/_remote_module_non_scriptable.py 2022-05-18T05:30:56.3564015Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:30:56.3579192Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpnazjvp4o 2022-05-18T05:30:56.3582222Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpnazjvp4o/_remote_module_non_scriptable.py 2022-05-18T05:30:56.3585311Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:30:56.3706527Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:30:56.5731663Z skip: Need at least 4 CUDA devices (2.889s) 2022-05-18T05:30:56.5861709Z test_sharded_softmax_on_sharding_dim (__main__.TestShardedSoftmax) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 117842 2022-05-18T05:30:56.5971359Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 117843 2022-05-18T05:30:56.6084824Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 117844 2022-05-18T05:30:56.6199474Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 117845 2022-05-18T05:30:57.4985650Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp_j44fuvk 2022-05-18T05:30:57.4986686Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp_j44fuvk/_remote_module_non_scriptable.py 2022-05-18T05:30:57.5106569Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:30:57.5386337Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpgt5evhhc 2022-05-18T05:30:57.5388328Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpgt5evhhc/_remote_module_non_scriptable.py 2022-05-18T05:30:57.5409816Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmp7dh2btde 2022-05-18T05:30:57.5413090Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmp7dh2btde/_remote_module_non_scriptable.py 2022-05-18T05:30:57.5520971Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:30:57.5534291Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:30:57.5734363Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpw9t6ve2w 2022-05-18T05:30:57.5735808Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpw9t6ve2w/_remote_module_non_scriptable.py 2022-05-18T05:30:57.5868908Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:30:57.8242898Z skip: Need at least 4 CUDA devices (1.251s) 2022-05-18T05:30:57.8243157Z 2022-05-18T05:30:57.8243564Z ---------------------------------------------------------------------- 2022-05-18T05:30:57.8243895Z Ran 2 tests in 4.140s 2022-05-18T05:30:57.8244065Z 2022-05-18T05:30:57.8244182Z OK (skipped=2) 2022-05-18T05:30:57.8244348Z 2022-05-18T05:30:57.8244765Z Generating XML reports... 2022-05-18T05:30:57.8291128Z Generated XML report: test-reports/python-unittest/distributed._shard.sharded_tensor.ops.test_softmax/TEST-TestShardedSoftmax-20220518053053.xml 2022-05-18T05:30:58.0916210Z Running distributed/_shard/sharded_tensor/test_sharded_tensor_reshard ... [2022-05-18 05:30:58.091111] 2022-05-18T05:30:58.0917060Z Executing ['/opt/conda/bin/python', 'distributed/_shard/sharded_tensor/test_sharded_tensor_reshard.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2022-05-18 05:30:58.091206] 2022-05-18T05:30:59.0048934Z Test results will be stored in test-reports/python-unittest/distributed._shard.sharded_tensor.test_sharded_tensor_reshard 2022-05-18T05:30:59.0066083Z 2022-05-18T05:30:59.0066530Z Running tests... 2022-05-18T05:30:59.0067030Z ---------------------------------------------------------------------- 2022-05-18T05:31:00.6642205Z test_sharded_tensor_reshard (__main__.TestReshard) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:31:00.7012163Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 118023 2022-05-18T05:31:00.7127005Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 118024 2022-05-18T05:31:00.7245482Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 118025 2022-05-18T05:31:00.7363253Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 118026 2022-05-18T05:31:01.6999513Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:31:01.7064794Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:31:01.7225603Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:31:01.7374825Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:31:01.9410186Z skip: Need at least 4 CUDA devices (2.934s) 2022-05-18T05:31:01.9553219Z test_sharded_tensor_reshard_errors (__main__.TestReshard) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 118167 2022-05-18T05:31:01.9669745Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 118168 2022-05-18T05:31:01.9784039Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 118169 2022-05-18T05:31:01.9899898Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 118170 2022-05-18T05:31:02.9092335Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:31:02.9092901Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:31:02.9346171Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:31:02.9612398Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:31:03.1942351Z skip: Need at least 4 CUDA devices (1.253s) 2022-05-18T05:31:03.1942597Z 2022-05-18T05:31:03.1942981Z ---------------------------------------------------------------------- 2022-05-18T05:31:03.1943333Z Ran 2 tests in 4.188s 2022-05-18T05:31:03.1943509Z 2022-05-18T05:31:03.1943624Z OK (skipped=2) 2022-05-18T05:31:03.1943784Z 2022-05-18T05:31:03.1943915Z Generating XML reports... 2022-05-18T05:31:03.2003816Z Generated XML report: test-reports/python-unittest/distributed._shard.sharded_tensor.test_sharded_tensor_reshard/TEST-TestReshard-20220518053058.xml 2022-05-18T05:31:03.4646807Z Running distributed/_shard/sharded_optim/test_sharded_optim ... [2022-05-18 05:31:03.464174] 2022-05-18T05:31:03.4647972Z Executing ['/opt/conda/bin/python', 'distributed/_shard/sharded_optim/test_sharded_optim.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2022-05-18 05:31:03.464274] 2022-05-18T05:31:04.3809826Z Test results will be stored in test-reports/python-unittest/distributed._shard.sharded_optim.test_sharded_optim 2022-05-18T05:31:04.3826369Z 2022-05-18T05:31:04.3827018Z Running tests... 2022-05-18T05:31:04.3827560Z ---------------------------------------------------------------------- 2022-05-18T05:31:06.0446783Z test_named_params_with_sharded_tensor (__main__.TestShardedOptimizer) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:31:06.0818956Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 118348 2022-05-18T05:31:06.0934960Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 118349 2022-05-18T05:31:06.1054413Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 118350 2022-05-18T05:31:06.1174153Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 118351 2022-05-18T05:31:07.0985124Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:31:07.1004157Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:31:07.1017712Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:31:07.1029901Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:31:07.3222678Z skip: Need at least 4 CUDA devices (2.939s) 2022-05-18T05:31:07.3376287Z test_sharded_optim (__main__.TestShardedOptimizer) ... INFO:torch.testing._internal.common_distributed:Started process 0 with pid 118492 2022-05-18T05:31:07.3487670Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 118493 2022-05-18T05:31:07.3604207Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 118494 2022-05-18T05:31:07.3720407Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 118495 2022-05-18T05:31:08.2805240Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:31:08.2853891Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:31:08.2881434Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:31:08.3386883Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:31:08.4759482Z skip: Need at least 4 CUDA devices (1.154s) 2022-05-18T05:31:08.4759738Z 2022-05-18T05:31:08.4760128Z ---------------------------------------------------------------------- 2022-05-18T05:31:08.4760477Z Ran 2 tests in 4.093s 2022-05-18T05:31:08.4760647Z 2022-05-18T05:31:08.4760740Z OK (skipped=2) 2022-05-18T05:31:08.4760897Z 2022-05-18T05:31:08.4761045Z Generating XML reports... 2022-05-18T05:31:08.4819050Z Generated XML report: test-reports/python-unittest/distributed._shard.sharded_optim.test_sharded_optim/TEST-TestShardedOptimizer-20220518053104.xml 2022-05-18T05:31:08.7429170Z Running distributed/_shard/sharded_tensor/test_megatron_prototype ... [2022-05-18 05:31:08.742455] 2022-05-18T05:31:08.7429969Z Executing ['/opt/conda/bin/python', 'distributed/_shard/sharded_tensor/test_megatron_prototype.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2022-05-18 05:31:08.742553] 2022-05-18T05:31:09.6143454Z Test results will be stored in test-reports/python-unittest/distributed._shard.sharded_tensor.test_megatron_prototype 2022-05-18T05:31:09.6161015Z 2022-05-18T05:31:09.6161496Z Running tests... 2022-05-18T05:31:09.6162017Z ---------------------------------------------------------------------- 2022-05-18T05:31:11.2816884Z test_megatron_two_layer_prototype (__main__.TestShardedTensorMegatronLinear) ... INFO:numba.cuda.cudadrv.driver:init 2022-05-18T05:31:11.3189489Z INFO:torch.testing._internal.common_distributed:Started process 0 with pid 118673 2022-05-18T05:31:11.3307961Z INFO:torch.testing._internal.common_distributed:Started process 1 with pid 118674 2022-05-18T05:31:11.3426646Z INFO:torch.testing._internal.common_distributed:Started process 2 with pid 118675 2022-05-18T05:31:11.3545553Z INFO:torch.testing._internal.common_distributed:Started process 3 with pid 118676 2022-05-18T05:31:12.2318676Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 1 2022-05-18T05:31:12.2462241Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 3 2022-05-18T05:31:12.2784127Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 2 2022-05-18T05:31:12.2836710Z INFO:torch.testing._internal.common_distributed:Starting event listener thread for rank 0 2022-05-18T05:31:12.4590983Z skip: Need at least 4 CUDA devices (2.843s) 2022-05-18T05:31:12.4591297Z 2022-05-18T05:31:12.4591696Z ---------------------------------------------------------------------- 2022-05-18T05:31:12.4592041Z Ran 1 test in 2.843s 2022-05-18T05:31:12.4592209Z 2022-05-18T05:31:12.4592328Z OK (skipped=1) 2022-05-18T05:31:12.4592491Z 2022-05-18T05:31:12.4592602Z Generating XML reports... 2022-05-18T05:31:12.4649599Z Generated XML report: test-reports/python-unittest/distributed._shard.sharded_tensor.test_megatron_prototype/TEST-TestShardedTensorMegatronLinear-20220518053109.xml 2022-05-18T05:31:12.7361032Z Running distributed/test_launcher ... [2022-05-18 05:31:12.735619] 2022-05-18T05:31:12.7362254Z Executing ['/opt/conda/bin/python', 'distributed/test_launcher.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2022-05-18 05:31:12.735739] 2022-05-18T05:31:13.9291262Z Test results will be stored in test-reports/python-unittest/distributed.test_launcher 2022-05-18T05:31:13.9307759Z 2022-05-18T05:31:13.9308117Z Running tests... 2022-05-18T05:31:13.9308624Z ---------------------------------------------------------------------- 2022-05-18T05:31:15.6117701Z test_launch_user_script (__main__.TestDistributedLaunch) ... /opt/conda/lib/python3.7/site-packages/torch/distributed/launch.py:186: FutureWarning: The module torch.distributed.launch is deprecated 2022-05-18T05:31:15.6118292Z and will be removed in future. Use torchrun. 2022-05-18T05:31:15.6118705Z Note that --use_env is set by default in torchrun. 2022-05-18T05:31:15.6119161Z If your script expects `--local_rank` argument to be set, please 2022-05-18T05:31:15.6119624Z change it to read from `os.environ['LOCAL_RANK']` instead. See 2022-05-18T05:31:15.6120110Z https://pytorch.org/docs/stable/distributed.html#launch-utility for 2022-05-18T05:31:15.6120463Z further instructions 2022-05-18T05:31:15.6120633Z 2022-05-18T05:31:15.6120745Z FutureWarning, 2022-05-18T05:31:15.6132183Z WARNING:torch.distributed.run: 2022-05-18T05:31:15.6132491Z ***************************************** 2022-05-18T05:31:15.6133037Z Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 2022-05-18T05:31:15.6133558Z ***************************************** 2022-05-18T05:31:15.6597209Z Success, smoke test 2022-05-18T05:31:15.6796108Z Success, smoke test 2022-05-18T05:31:15.7006686Z Success, smoke test 2022-05-18T05:31:15.7208718Z Success, smoke test 2022-05-18T05:31:16.7076666Z ok (2.777s) 2022-05-18T05:31:16.7079426Z 2022-05-18T05:31:16.7080015Z ---------------------------------------------------------------------- 2022-05-18T05:31:16.7080373Z Ran 1 test in 2.777s 2022-05-18T05:31:16.7080564Z 2022-05-18T05:31:16.7080673Z OK 2022-05-18T05:31:16.7080826Z 2022-05-18T05:31:16.7080974Z Generating XML reports... 2022-05-18T05:31:16.7152897Z Generated XML report: test-reports/python-unittest/distributed.test_launcher/TEST-TestDistributedLaunch-20220518053113.xml 2022-05-18T05:31:17.0000500Z Running distributed/elastic/utils/util_test ... [2022-05-18 05:31:16.999581] 2022-05-18T05:31:17.0001561Z Executing ['/opt/conda/bin/python', 'distributed/elastic/utils/util_test.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2022-05-18 05:31:16.999676] 2022-05-18T05:31:17.9103214Z Test results will be stored in test-reports/python-unittest/distributed.elastic.utils.util_test 2022-05-18T05:31:17.9120428Z 2022-05-18T05:31:17.9120943Z Running tests... 2022-05-18T05:31:17.9121404Z ---------------------------------------------------------------------- 2022-05-18T05:31:19.5881807Z test_get_all_rank_0 (__main__.StoreUtilTest) ... ok (1.676s) 2022-05-18T05:31:19.5902109Z test_get_all_rank_n (__main__.StoreUtilTest) ... ok (0.002s) 2022-05-18T05:31:19.5927330Z test_synchronize (__main__.StoreUtilTest) ... ok (0.002s) 2022-05-18T05:31:19.6695059Z test_get_logger (__main__.UtilTest) ... ok (0.077s) 2022-05-18T05:31:19.6702059Z test_get_logger_custom_name (__main__.UtilTest) ... ok (0.001s) 2022-05-18T05:31:19.6711834Z test_get_logger_different (__main__.UtilTest) ... ok (0.001s) 2022-05-18T05:31:19.6725462Z test_get_logger_none (__main__.UtilTest) ... ok (0.001s) 2022-05-18T05:31:19.6725837Z 2022-05-18T05:31:19.6726327Z ---------------------------------------------------------------------- 2022-05-18T05:31:19.6726685Z Ran 7 tests in 1.761s 2022-05-18T05:31:19.6726864Z 2022-05-18T05:31:19.6726966Z OK 2022-05-18T05:31:19.6727104Z 2022-05-18T05:31:19.6727249Z Generating XML reports... 2022-05-18T05:31:19.6761101Z Generated XML report: test-reports/python-unittest/distributed.elastic.utils.util_test/TEST-StoreUtilTest-20220518053117.xml 2022-05-18T05:31:19.6767550Z Generated XML report: test-reports/python-unittest/distributed.elastic.utils.util_test/TEST-UtilTest-20220518053117.xml 2022-05-18T05:31:19.9083722Z Running distributed/elastic/metrics/api_test ... [2022-05-18 05:31:19.907881] 2022-05-18T05:31:19.9084488Z Executing ['/opt/conda/bin/python', 'distributed/elastic/metrics/api_test.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2022-05-18 05:31:19.907976] 2022-05-18T05:31:20.7981694Z Test results will be stored in test-reports/python-unittest/distributed.elastic.metrics.api_test 2022-05-18T05:31:20.7997282Z 2022-05-18T05:31:20.7997482Z Running tests... 2022-05-18T05:31:20.7997928Z ---------------------------------------------------------------------- 2022-05-18T05:31:22.4760742Z test_get_metric_name (__main__.MetricsApiTest) ... ok (1.676s) 2022-05-18T05:31:22.4773821Z test_inheritance (__main__.MetricsApiTest) ... ok (0.001s) 2022-05-18T05:31:22.4793532Z test_profile (__main__.MetricsApiTest) ... ok (0.002s) 2022-05-18T05:31:22.4794062Z 2022-05-18T05:31:22.4794440Z ---------------------------------------------------------------------- 2022-05-18T05:31:22.4794786Z Ran 3 tests in 1.680s 2022-05-18T05:31:22.4794960Z 2022-05-18T05:31:22.4795058Z OK 2022-05-18T05:31:22.4795194Z 2022-05-18T05:31:22.4795305Z Generating XML reports... 2022-05-18T05:31:22.4830056Z Generated XML report: test-reports/python-unittest/distributed.elastic.metrics.api_test/TEST-MetricsApiTest-20220518053120.xml 2022-05-18T05:31:22.7141421Z Running distributed/fsdp/test_utils ... [2022-05-18 05:31:22.713665] 2022-05-18T05:31:22.7142152Z Executing ['/opt/conda/bin/python', 'distributed/fsdp/test_utils.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2022-05-18 05:31:22.713763] 2022-05-18T05:31:23.6237857Z Test results will be stored in test-reports/python-unittest/distributed.fsdp.test_utils 2022-05-18T05:31:23.6254181Z 2022-05-18T05:31:23.6254519Z Running tests... 2022-05-18T05:31:23.6254968Z ---------------------------------------------------------------------- 2022-05-18T05:31:25.2900254Z test_apply_to_tensors_cpu_cuda (__main__.TestUtils) ... ok (1.664s) 2022-05-18T05:31:25.2934233Z test_apply_to_tensors_devices_['cpu'] (__main__.TestUtils) ... ok (0.003s) 2022-05-18T05:31:25.2965139Z test_apply_to_tensors_devices_['cuda'] (__main__.TestUtils) ... ok (0.003s) 2022-05-18T05:31:25.2974649Z test_packed_sequence (__main__.TestUtils) 2022-05-18T05:31:25.2995669Z Test to ensure RNN packed sequences are modified correctly. ... ok (0.003s) 2022-05-18T05:31:25.3008205Z test_replace_by_prefix (__main__.TestUtils) ... ok (0.001s) 2022-05-18T05:31:25.3008803Z 2022-05-18T05:31:25.3009110Z ---------------------------------------------------------------------- 2022-05-18T05:31:25.3009466Z Ran 5 tests in 1.675s 2022-05-18T05:31:25.3009751Z 2022-05-18T05:31:25.3009862Z OK 2022-05-18T05:31:25.3010001Z 2022-05-18T05:31:25.3010130Z Generating XML reports... 2022-05-18T05:31:25.3051920Z Generated XML report: test-reports/python-unittest/distributed.fsdp.test_utils/TEST-TestUtils-20220518053123.xml 2022-05-18T05:31:25.5416000Z Running distributed/_shard/sharded_tensor/ops/test_math_ops ... [2022-05-18 05:31:25.541102] 2022-05-18T05:31:25.5416782Z Executing ['/opt/conda/bin/python', 'distributed/_shard/sharded_tensor/ops/test_math_ops.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2022-05-18 05:31:25.541196] 2022-05-18T05:31:26.5175732Z Running distributed/_shard/test_replicated_tensor ... [2022-05-18 05:31:26.516959] 2022-05-18T05:31:26.5176541Z Executing ['/opt/conda/bin/python', 'distributed/_shard/test_replicated_tensor.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2022-05-18 05:31:26.517156] 2022-05-18T05:31:27.5049517Z Running distributed/elastic/events/lib_test ... [2022-05-18 05:31:27.504493] 2022-05-18T05:31:27.5050191Z Executing ['/opt/conda/bin/python', '-m', 'pytest', 'distributed/elastic/events/lib_test.py', '-v'] ... [2022-05-18 05:31:27.504594] 2022-05-18T05:31:28.2251356Z ============================= test session starts ============================== 2022-05-18T05:31:28.2301951Z platform linux -- Python 3.7.13, pytest-7.1.2, pluggy-1.0.0 -- /opt/conda/bin/python 2022-05-18T05:31:28.2302324Z cachedir: .pytest_cache 2022-05-18T05:31:28.2302911Z hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/var/lib/jenkins/workspace/test/.hypothesis/examples') 2022-05-18T05:31:28.2303421Z rootdir: /var/lib/jenkins/workspace, configfile: pytest.ini 2022-05-18T05:31:28.2303809Z plugins: hypothesis-4.53.2 2022-05-18T05:31:29.0459560Z collecting ...  2022-05-18T05:31:29.0473709Z collecting 3 items  2022-05-18T05:31:29.0474184Z collected 8 items  2022-05-18T05:31:29.0479173Z 2022-05-18T05:31:29.0497185Z distributed/elastic/events/lib_test.py::EventLibTest::test_event_created PASSED [ 12%] 2022-05-18T05:31:29.0511898Z distributed/elastic/events/lib_test.py::EventLibTest::test_event_deser PASSED [ 25%] 2022-05-18T05:31:29.0529682Z distributed/elastic/events/lib_test.py::EventLibTest::test_get_or_create_logger PASSED [ 37%] 2022-05-18T05:31:29.1206430Z distributed/elastic/events/lib_test.py::RdzvEventLibTest::test_construct_and_record_rdzv_event PASSED [ 50%] 2022-05-18T05:31:29.1225845Z distributed/elastic/events/lib_test.py::RdzvEventLibTest::test_construct_and_record_rdzv_event_does_not_run_if_invalid_dest PASSED [ 62%] 2022-05-18T05:31:29.1239207Z distributed/elastic/events/lib_test.py::RdzvEventLibTest::test_rdzv_event_created PASSED [ 75%] 2022-05-18T05:31:29.1254192Z distributed/elastic/events/lib_test.py::RdzvEventLibTest::test_rdzv_event_deserialize PASSED [ 87%] 2022-05-18T05:31:29.1275577Z distributed/elastic/events/lib_test.py::RdzvEventLibTest::test_rdzv_event_str PASSED [100%] 2022-05-18T05:31:29.1277610Z 2022-05-18T05:31:29.1278013Z ============================== 8 passed in 0.90s =============================== 2022-05-18T05:31:29.2743009Z Running distributed/fsdp/test_shard_utils ... [2022-05-18 05:31:29.273872] 2022-05-18T05:31:29.2743736Z Executing ['/opt/conda/bin/python', 'distributed/fsdp/test_shard_utils.py', '-v', '--import-slow-tests', '--import-disabled-tests'] ... [2022-05-18 05:31:29.273972] 2022-05-18T05:31:30.3045858Z Running distributed/pipeline/sync/skip/test_gpipe ... [2022-05-18 05:31:30.304118] 2022-05-18T05:31:30.3046546Z Executing ['/opt/conda/bin/python', '-m', 'pytest', 'distributed/pipeline/sync/skip/test_gpipe.py', '-v'] ... [2022-05-18 05:31:30.304217] 2022-05-18T05:31:31.5398049Z ============================= test session starts ============================== 2022-05-18T05:31:31.5398715Z platform linux -- Python 3.7.13, pytest-7.1.2, pluggy-1.0.0 -- /opt/conda/bin/python 2022-05-18T05:31:31.5419428Z cachedir: .pytest_cache 2022-05-18T05:31:31.5420604Z hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/var/lib/jenkins/workspace/test/.hypothesis/examples') 2022-05-18T05:31:31.5421275Z torch: 1.12.0a0+git3b23752 2022-05-18T05:31:31.5421617Z rootdir: /var/lib/jenkins/workspace, configfile: pytest.ini 2022-05-18T05:31:31.5422020Z plugins: hypothesis-4.53.2 2022-05-18T05:31:31.5724285Z collecting ...  2022-05-18T05:31:31.5725178Z collected 13 items  2022-05-18T05:31:31.5729196Z 2022-05-18T05:31:34.0889167Z distributed/pipeline/sync/skip/test_gpipe.py::test_1to3[never-3] PASSED [ 7%] 2022-05-18T05:31:35.8831580Z distributed/pipeline/sync/skip/test_gpipe.py::test_1to3[never-1:2] PASSED [ 15%] 2022-05-18T05:31:35.9328329Z distributed/pipeline/sync/skip/test_gpipe.py::test_1to3[never-2:1] PASSED [ 23%] 2022-05-18T05:31:35.9483195Z distributed/pipeline/sync/skip/test_gpipe.py::test_1to3[never-1:1:1] SKIPPED [ 30%] 2022-05-18T05:31:36.0062438Z distributed/pipeline/sync/skip/test_gpipe.py::test_1to3[always-3] PASSED [ 38%] 2022-05-18T05:31:36.0607372Z distributed/pipeline/sync/skip/test_gpipe.py::test_1to3[always-1:2] PASSED [ 46%] 2022-05-18T05:31:36.1113644Z distributed/pipeline/sync/skip/test_gpipe.py::test_1to3[always-2:1] PASSED [ 53%] 2022-05-18T05:31:36.1265399Z distributed/pipeline/sync/skip/test_gpipe.py::test_1to3[always-1:1:1] SKIPPED [ 61%] 2022-05-18T05:31:36.1749595Z distributed/pipeline/sync/skip/test_gpipe.py::test_1to3[except_last-3] PASSED [ 69%] 2022-05-18T05:31:36.2260407Z distributed/pipeline/sync/skip/test_gpipe.py::test_1to3[except_last-1:2] PASSED [ 76%] 2022-05-18T05:31:36.2736841Z distributed/pipeline/sync/skip/test_gpipe.py::test_1to3[except_last-2:1] PASSED [ 84%] 2022-05-18T05:31:36.2889044Z distributed/pipeline/sync/skip/test_gpipe.py::test_1to3[except_last-1:1:1] SKIPPED [ 92%] 2022-05-18T05:31:36.3159470Z distributed/pipeline/sync/skip/test_gpipe.py::test_none_skip PASSED [100%] 2022-05-18T05:31:36.3162342Z 2022-05-18T05:31:36.3162598Z =========================== short test summary info ============================ 2022-05-18T05:31:36.3163051Z SKIPPED [3] distributed/pipeline/sync/skip/test_gpipe.py:24: at least 3 cuda devices required 2022-05-18T05:31:36.3163792Z ======================== 10 passed, 3 skipped in 4.78s ========================= 2022-05-18T05:31:36.8600099Z Running distributed/pipeline/sync/skip/test_leak ... [2022-05-18 05:31:36.859490] 2022-05-18T05:31:36.8600781Z Executing ['/opt/conda/bin/python', '-m', 'pytest', 'distributed/pipeline/sync/skip/test_leak.py', '-v'] ... [2022-05-18 05:31:36.859591] 2022-05-18T05:31:38.0927270Z ============================= test session starts ============================== 2022-05-18T05:31:38.0927852Z platform linux -- Python 3.7.13, pytest-7.1.2, pluggy-1.0.0 -- /opt/conda/bin/python 2022-05-18T05:31:38.0948582Z cachedir: .pytest_cache 2022-05-18T05:31:38.0949575Z hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/var/lib/jenkins/workspace/test/.hypothesis/examples') 2022-05-18T05:31:38.0950061Z torch: 1.12.0a0+git3b23752 2022-05-18T05:31:38.0950723Z rootdir: /var/lib/jenkins/workspace, configfile: pytest.ini 2022-05-18T05:31:38.0951107Z plugins: hypothesis-4.53.2 2022-05-18T05:31:38.1129548Z collecting ...  2022-05-18T05:31:38.1130120Z collected 8 items  2022-05-18T05:31:38.1134577Z 2022-05-18T05:31:38.2095902Z distributed/pipeline/sync/skip/test_leak.py::test_delete_portal_tensor[always-train] PASSED [ 12%] 2022-05-18T05:31:38.2279428Z distributed/pipeline/sync/skip/test_leak.py::test_delete_portal_tensor[always-eval] PASSED [ 25%] 2022-05-18T05:31:38.2489303Z distributed/pipeline/sync/skip/test_leak.py::test_delete_portal_tensor[except_last-train] PASSED [ 37%] 2022-05-18T05:31:38.2676156Z distributed/pipeline/sync/skip/test_leak.py::test_delete_portal_tensor[except_last-eval] PASSED [ 50%] 2022-05-18T05:31:38.2872009Z distributed/pipeline/sync/skip/test_leak.py::test_delete_portal_tensor[never-train] PASSED [ 62%] 2022-05-18T05:31:38.3057519Z distributed/pipeline/sync/skip/test_leak.py::test_delete_portal_tensor[never-eval] PASSED [ 75%] 2022-05-18T05:31:38.3215607Z distributed/pipeline/sync/skip/test_leak.py::test_no_portal_without_pipe[train] PASSED [ 87%] 2022-05-18T05:31:38.3373279Z distributed/pipeline/sync/skip/test_leak.py::test_no_portal_without_pipe[eval] PASSED [100%] 2022-05-18T05:31:38.3374300Z 2022-05-18T05:31:38.3374783Z ============================== 8 passed in 0.24s =============================== 2022-05-18T05:31:38.4777858Z Running distributed/pipeline/sync/skip/test_stash_pop ... [2022-05-18 05:31:38.477350] 2022-05-18T05:31:38.4778746Z Executing ['/opt/conda/bin/python', '-m', 'pytest', 'distributed/pipeline/sync/skip/test_stash_pop.py', '-v'] ... [2022-05-18 05:31:38.477450] 2022-05-18T05:31:39.7340804Z ============================= test session starts ============================== 2022-05-18T05:31:39.7341446Z platform linux -- Python 3.7.13, pytest-7.1.2, pluggy-1.0.0 -- /opt/conda/bin/python 2022-05-18T05:31:39.7362100Z cachedir: .pytest_cache 2022-05-18T05:31:39.7362737Z hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/var/lib/jenkins/workspace/test/.hypothesis/examples') 2022-05-18T05:31:39.7363184Z torch: 1.12.0a0+git3b23752 2022-05-18T05:31:39.7363527Z rootdir: /var/lib/jenkins/workspace, configfile: pytest.ini 2022-05-18T05:31:39.7363936Z plugins: hypothesis-4.53.2 2022-05-18T05:31:39.7535973Z collecting ...  2022-05-18T05:31:39.7536385Z collected 7 items  2022-05-18T05:31:39.7540996Z 2022-05-18T05:31:39.7591321Z distributed/pipeline/sync/skip/test_stash_pop.py::test_stash PASSED [ 14%] 2022-05-18T05:31:39.7613830Z distributed/pipeline/sync/skip/test_stash_pop.py::test_pop PASSED [ 28%] 2022-05-18T05:31:39.7636824Z distributed/pipeline/sync/skip/test_stash_pop.py::test_declare_but_not_use PASSED [ 42%] 2022-05-18T05:31:39.7657131Z distributed/pipeline/sync/skip/test_stash_pop.py::test_stash_not_declared PASSED [ 57%] 2022-05-18T05:31:39.7678864Z distributed/pipeline/sync/skip/test_stash_pop.py::test_pop_not_declared PASSED [ 71%] 2022-05-18T05:31:39.7699235Z distributed/pipeline/sync/skip/test_stash_pop.py::test_pop_not_stashed PASSED [ 85%] 2022-05-18T05:31:39.7722897Z distributed/pipeline/sync/skip/test_stash_pop.py::test_stash_none PASSED [100%] 2022-05-18T05:31:39.7724670Z 2022-05-18T05:31:39.7725358Z ============================== 7 passed in 0.04s =============================== 2022-05-18T05:31:39.9107048Z Running distributed/pipeline/sync/skip/test_verify_skippables ... [2022-05-18 05:31:39.910254] 2022-05-18T05:31:39.9107741Z Executing ['/opt/conda/bin/python', '-m', 'pytest', 'distributed/pipeline/sync/skip/test_verify_skippables.py', '-v'] ... [2022-05-18 05:31:39.910356] 2022-05-18T05:31:41.0865176Z ============================= test session starts ============================== 2022-05-18T05:31:41.0866284Z platform linux -- Python 3.7.13, pytest-7.1.2, pluggy-1.0.0 -- /opt/conda/bin/python 2022-05-18T05:31:41.0888208Z cachedir: .pytest_cache 2022-05-18T05:31:41.0889809Z hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/var/lib/jenkins/workspace/test/.hypothesis/examples') 2022-05-18T05:31:41.0891085Z torch: 1.12.0a0+git3b23752 2022-05-18T05:31:41.0891760Z rootdir: /var/lib/jenkins/workspace, configfile: pytest.ini 2022-05-18T05:31:41.0892577Z plugins: hypothesis-4.53.2 2022-05-18T05:31:41.1095011Z collecting ...  2022-05-18T05:31:41.1095860Z collected 9 items  2022-05-18T05:31:41.1101220Z 2022-05-18T05:31:41.1146201Z distributed/pipeline/sync/skip/test_verify_skippables.py::test_matching PASSED [ 11%] 2022-05-18T05:31:41.1167475Z distributed/pipeline/sync/skip/test_verify_skippables.py::test_stash_not_pop PASSED [ 22%] 2022-05-18T05:31:41.1188286Z distributed/pipeline/sync/skip/test_verify_skippables.py::test_pop_unknown PASSED [ 33%] 2022-05-18T05:31:41.1210910Z distributed/pipeline/sync/skip/test_verify_skippables.py::test_stash_again PASSED [ 44%] 2022-05-18T05:31:41.1234015Z distributed/pipeline/sync/skip/test_verify_skippables.py::test_pop_again PASSED [ 55%] 2022-05-18T05:31:41.1258723Z distributed/pipeline/sync/skip/test_verify_skippables.py::test_stash_pop_together_different_names PASSED [ 66%] 2022-05-18T05:31:41.1278924Z distributed/pipeline/sync/skip/test_verify_skippables.py::test_stash_pop_together_same_name PASSED [ 77%] 2022-05-18T05:31:41.1302248Z distributed/pipeline/sync/skip/test_verify_skippables.py::test_double_stash_pop PASSED [ 88%] 2022-05-18T05:31:41.1329714Z distributed/pipeline/sync/skip/test_verify_skippables.py::test_double_stash_pop_but_isolated PASSED [100%] 2022-05-18T05:31:41.1330889Z 2022-05-18T05:31:41.1331489Z ============================== 9 passed in 0.05s =============================== 2022-05-18T05:31:41.2743972Z Running distributed/pipeline/sync/test_bugs ... [2022-05-18 05:31:41.273947] 2022-05-18T05:31:41.2744621Z Executing ['/opt/conda/bin/python', '-m', 'pytest', 'distributed/pipeline/sync/test_bugs.py', '-v'] ... [2022-05-18 05:31:41.274046] 2022-05-18T05:31:42.5118248Z ============================= test session starts ============================== 2022-05-18T05:31:42.5118861Z platform linux -- Python 3.7.13, pytest-7.1.2, pluggy-1.0.0 -- /opt/conda/bin/python 2022-05-18T05:31:42.5139426Z cachedir: .pytest_cache 2022-05-18T05:31:42.5140420Z hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/var/lib/jenkins/workspace/test/.hypothesis/examples') 2022-05-18T05:31:42.5140936Z torch: 1.12.0a0+git3b23752 2022-05-18T05:31:42.5141274Z rootdir: /var/lib/jenkins/workspace, configfile: pytest.ini 2022-05-18T05:31:42.5141674Z plugins: hypothesis-4.53.2 2022-05-18T05:31:42.5376963Z collecting ...  2022-05-18T05:31:42.5377391Z collected 4 items  2022-05-18T05:31:42.5381611Z 2022-05-18T05:31:42.6196255Z distributed/pipeline/sync/test_bugs.py::test_python_autograd_function PASSED [ 25%] 2022-05-18T05:31:42.6390942Z distributed/pipeline/sync/test_bugs.py::test_exception_no_hang PASSED [ 50%] 2022-05-18T05:31:46.2688642Z distributed/pipeline/sync/test_bugs.py::test_tuple_wait PASSED [ 75%] 2022-05-18T05:31:46.4089109Z distributed/pipeline/sync/test_bugs.py::test_parallel_randoms PASSED [100%] 2022-05-18T05:31:46.4091831Z 2022-05-18T05:31:46.4092420Z ============================== 4 passed in 3.90s =============================== 2022-05-18T05:31:46.6689964Z Running distributed/pipeline/sync/test_copy ... [2022-05-18 05:31:46.668471] 2022-05-18T05:31:46.6690846Z Executing ['/opt/conda/bin/python', '-m', 'pytest', 'distributed/pipeline/sync/test_copy.py', '-v'] ... [2022-05-18 05:31:46.668571] 2022-05-18T05:31:47.9156324Z ============================= test session starts ============================== 2022-05-18T05:31:47.9156931Z platform linux -- Python 3.7.13, pytest-7.1.2, pluggy-1.0.0 -- /opt/conda/bin/python 2022-05-18T05:31:47.9177562Z cachedir: .pytest_cache 2022-05-18T05:31:47.9178454Z hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/var/lib/jenkins/workspace/test/.hypothesis/examples') 2022-05-18T05:31:47.9178908Z torch: 1.12.0a0+git3b23752 2022-05-18T05:31:47.9179227Z rootdir: /var/lib/jenkins/workspace, configfile: pytest.ini 2022-05-18T05:31:47.9179619Z plugins: hypothesis-4.53.2 2022-05-18T05:31:47.9418898Z collecting ...  2022-05-18T05:31:47.9419320Z collected 5 items  2022-05-18T05:31:47.9423937Z 2022-05-18T05:31:47.9483315Z distributed/pipeline/sync/test_copy.py::test_copy_wait_cpu_cpu PASSED [ 20%] 2022-05-18T05:31:49.2164121Z distributed/pipeline/sync/test_copy.py::test_copy_wait_cpu_cuda PASSED [ 40%] 2022-05-18T05:31:49.6678285Z distributed/pipeline/sync/test_copy.py::test_copy_wait_cuda_cpu PASSED [ 60%] 2022-05-18T05:31:50.0238460Z distributed/pipeline/sync/test_copy.py::test_copy_wait_cuda_cuda PASSED [ 80%] 2022-05-18T05:31:50.0261862Z distributed/pipeline/sync/test_copy.py::test_wait_multiple_tensors PASSED [100%] 2022-05-18T05:31:50.0263967Z 2022-05-18T05:31:50.0264318Z ============================== 5 passed in 2.11s =============================== 2022-05-18T05:31:50.2410051Z Running distributed/pipeline/sync/test_dependency ... [2022-05-18 05:31:50.240549] 2022-05-18T05:31:50.2410918Z Executing ['/opt/conda/bin/python', '-m', 'pytest', 'distributed/pipeline/sync/test_dependency.py', '-v'] ... [2022-05-18 05:31:50.240652] 2022-05-18T05:31:51.4672925Z ============================= test session starts ============================== 2022-05-18T05:31:51.4673522Z platform linux -- Python 3.7.13, pytest-7.1.2, pluggy-1.0.0 -- /opt/conda/bin/python 2022-05-18T05:31:51.4693825Z cachedir: .pytest_cache 2022-05-18T05:31:51.4694437Z hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/var/lib/jenkins/workspace/test/.hypothesis/examples') 2022-05-18T05:31:51.4694887Z torch: 1.12.0a0+git3b23752 2022-05-18T05:31:51.4695215Z rootdir: /var/lib/jenkins/workspace, configfile: pytest.ini 2022-05-18T05:31:51.4695608Z plugins: hypothesis-4.53.2 2022-05-18T05:31:51.5003382Z collecting ...  2022-05-18T05:31:51.5003807Z collected 6 items  2022-05-18T05:31:51.5008113Z 2022-05-18T05:31:52.7238369Z distributed/pipeline/sync/test_dependency.py::test_fork_join PASSED [ 16%] 2022-05-18T05:31:52.7253163Z distributed/pipeline/sync/test_dependency.py::test_fork_join_enable_grad PASSED [ 33%] 2022-05-18T05:31:52.7271071Z distributed/pipeline/sync/test_dependency.py::test_fork_join_no_grad PASSED [ 50%] 2022-05-18T05:31:52.7287590Z distributed/pipeline/sync/test_dependency.py::test_fork_leak PASSED [ 66%] 2022-05-18T05:31:52.7302223Z distributed/pipeline/sync/test_dependency.py::test_join_when_fork_not_requires_grad PASSED [ 83%] 2022-05-18T05:31:52.7320665Z distributed/pipeline/sync/test_dependency.py::test_join_when_fork_requires_grad PASSED [100%] 2022-05-18T05:31:52.7322675Z 2022-05-18T05:31:52.7323371Z ============================== 6 passed in 1.27s =============================== 2022-05-18T05:31:52.9343694Z Running distributed/pipeline/sync/test_microbatch ... [2022-05-18 05:31:52.933858] 2022-05-18T05:31:52.9344385Z Executing ['/opt/conda/bin/python', '-m', 'pytest', 'distributed/pipeline/sync/test_microbatch.py', '-v'] ... [2022-05-18 05:31:52.933960] 2022-05-18T05:31:54.1477068Z ============================= test session starts ============================== 2022-05-18T05:31:54.1477720Z platform linux -- Python 3.7.13, pytest-7.1.2, pluggy-1.0.0 -- /opt/conda/bin/python 2022-05-18T05:31:54.1498369Z cachedir: .pytest_cache 2022-05-18T05:31:54.1499248Z hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/var/lib/jenkins/workspace/test/.hypothesis/examples') 2022-05-18T05:31:54.1499683Z torch: 1.12.0a0+git3b23752 2022-05-18T05:31:54.1500025Z rootdir: /var/lib/jenkins/workspace, configfile: pytest.ini 2022-05-18T05:31:54.1500413Z plugins: hypothesis-4.53.2 2022-05-18T05:31:54.1822652Z collecting ...  2022-05-18T05:31:54.1823095Z collected 10 items  2022-05-18T05:31:54.1827410Z 2022-05-18T05:31:54.1862749Z distributed/pipeline/sync/test_microbatch.py::test_batch_atomic PASSED [ 10%] 2022-05-18T05:31:54.1880163Z distributed/pipeline/sync/test_microbatch.py::test_batch_non_atomic PASSED [ 20%] 2022-05-18T05:31:54.1897593Z distributed/pipeline/sync/test_microbatch.py::test_batch_call PASSED [ 30%] 2022-05-18T05:31:54.1915355Z distributed/pipeline/sync/test_microbatch.py::test_batch_setitem_by_index PASSED [ 40%] 2022-05-18T05:31:54.1932873Z distributed/pipeline/sync/test_microbatch.py::test_batch_setitem_by_slice PASSED [ 50%] 2022-05-18T05:31:54.1953833Z distributed/pipeline/sync/test_microbatch.py::test_check PASSED [ 60%] 2022-05-18T05:31:54.1980957Z distributed/pipeline/sync/test_microbatch.py::test_gather_tensors PASSED [ 70%] 2022-05-18T05:31:54.1998323Z distributed/pipeline/sync/test_microbatch.py::test_gather_tuples PASSED [ 80%] 2022-05-18T05:31:54.2016445Z distributed/pipeline/sync/test_microbatch.py::test_scatter_tensor PASSED [ 90%] 2022-05-18T05:31:54.2038049Z distributed/pipeline/sync/test_microbatch.py::test_scatter_multiple_tensors PASSED [100%] 2022-05-18T05:31:54.2039827Z 2022-05-18T05:31:54.2040373Z ============================== 10 passed in 0.06s ============================== 2022-05-18T05:31:54.3456396Z Running distributed/pipeline/sync/test_pipe ... [2022-05-18 05:31:54.345170] 2022-05-18T05:31:54.3457053Z Executing ['/opt/conda/bin/python', '-m', 'pytest', 'distributed/pipeline/sync/test_pipe.py', '-v'] ... [2022-05-18 05:31:54.345277] 2022-05-18T05:31:55.5658866Z ============================= test session starts ============================== 2022-05-18T05:31:55.5659480Z platform linux -- Python 3.7.13, pytest-7.1.2, pluggy-1.0.0 -- /opt/conda/bin/python 2022-05-18T05:31:55.5680093Z cachedir: .pytest_cache 2022-05-18T05:31:55.5680703Z hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/var/lib/jenkins/workspace/test/.hypothesis/examples') 2022-05-18T05:31:55.5681130Z torch: 1.12.0a0+git3b23752 2022-05-18T05:31:55.5681472Z rootdir: /var/lib/jenkins/workspace, configfile: pytest.ini 2022-05-18T05:31:55.5681869Z plugins: hypothesis-4.53.2 2022-05-18T05:31:55.6731182Z collecting ...  2022-05-18T05:31:55.6731631Z collected 56 items  2022-05-18T05:31:55.6736100Z 2022-05-18T05:31:55.6782469Z distributed/pipeline/sync/test_pipe.py::test_pipe_without_rpc PASSED [ 1%] 2022-05-18T05:31:55.7557839Z distributed/pipeline/sync/test_pipe.py::test_parameters PASSED [ 3%] 2022-05-18T05:31:55.7712693Z distributed/pipeline/sync/test_pipe.py::test_public_attrs PASSED [ 5%] 2022-05-18T05:31:55.7874507Z distributed/pipeline/sync/test_pipe.py::test_sequential_like PASSED [ 7%] 2022-05-18T05:31:55.8025936Z distributed/pipeline/sync/test_pipe.py::test_chunks_less_than_1 PASSED [ 8%] 2022-05-18T05:31:55.8206096Z distributed/pipeline/sync/test_pipe.py::test_batch_size_indivisible PASSED [ 10%] 2022-05-18T05:31:55.8375287Z distributed/pipeline/sync/test_pipe.py::test_batch_size_small PASSED [ 12%] 2022-05-18T05:31:55.8566648Z distributed/pipeline/sync/test_pipe.py::test_checkpoint_mode PASSED [ 14%] 2022-05-18T05:31:55.8720416Z distributed/pipeline/sync/test_pipe.py::test_checkpoint_mode_invalid PASSED [ 16%] 2022-05-18T05:31:55.8882397Z distributed/pipeline/sync/test_pipe.py::test_checkpoint_mode_when_chunks_1 PASSED [ 17%] 2022-05-18T05:31:55.9055749Z distributed/pipeline/sync/test_pipe.py::test_checkpoint_eval PASSED [ 19%] 2022-05-18T05:31:55.9241141Z distributed/pipeline/sync/test_pipe.py::test_checkpoint_non_float_input PASSED [ 21%] 2022-05-18T05:31:55.9403894Z distributed/pipeline/sync/test_pipe.py::test_no_grad PASSED [ 23%] 2022-05-18T05:31:55.9560665Z distributed/pipeline/sync/test_pipe.py::test_exception PASSED [ 25%] 2022-05-18T05:31:56.1756264Z distributed/pipeline/sync/test_pipe.py::test_exception_early_stop_asap PASSED [ 26%] 2022-05-18T05:31:56.1944317Z distributed/pipeline/sync/test_pipe.py::test_nested_input PASSED [ 28%] 2022-05-18T05:31:56.2121822Z distributed/pipeline/sync/test_pipe.py::test_input_pair PASSED [ 30%] 2022-05-18T05:31:56.2289468Z distributed/pipeline/sync/test_pipe.py::test_multi_sequence_input PASSED [ 32%] 2022-05-18T05:31:56.2462986Z distributed/pipeline/sync/test_pipe.py::test_input_singleton PASSED [ 33%] 2022-05-18T05:31:56.2621845Z distributed/pipeline/sync/test_pipe.py::test_input_varargs PASSED [ 35%] 2022-05-18T05:31:56.2778087Z distributed/pipeline/sync/test_pipe.py::test_non_tensor PASSED [ 37%] 2022-05-18T05:31:56.2947174Z distributed/pipeline/sync/test_pipe.py::test_non_tensor_sequence PASSED [ 39%] 2022-05-18T05:31:56.3206597Z distributed/pipeline/sync/test_pipe.py::test_valid_non_tensor[never] PASSED [ 41%] 2022-05-18T05:31:56.3489092Z distributed/pipeline/sync/test_pipe.py::test_valid_non_tensor[always] PASSED [ 42%] 2022-05-18T05:31:56.3761784Z distributed/pipeline/sync/test_pipe.py::test_valid_non_tensor[except_last] PASSED [ 44%] 2022-05-18T05:31:56.3922729Z distributed/pipeline/sync/test_pipe.py::test_no_tensor_output[never] PASSED [ 46%] 2022-05-18T05:31:56.4081909Z distributed/pipeline/sync/test_pipe.py::test_no_tensor_output[always] PASSED [ 48%] 2022-05-18T05:31:56.4242122Z distributed/pipeline/sync/test_pipe.py::test_no_tensor_output[except_last] PASSED [ 50%] 2022-05-18T05:31:56.4414311Z distributed/pipeline/sync/test_pipe.py::test_uneven_batch_size[never] PASSED [ 51%] 2022-05-18T05:31:56.4593907Z distributed/pipeline/sync/test_pipe.py::test_uneven_batch_size[always] PASSED [ 53%] 2022-05-18T05:31:56.4771642Z distributed/pipeline/sync/test_pipe.py::test_uneven_batch_size[except_last] PASSED [ 55%] 2022-05-18T05:31:56.4944793Z distributed/pipeline/sync/test_pipe.py::test_no_chunk[never] PASSED [ 57%] 2022-05-18T05:31:56.5125091Z distributed/pipeline/sync/test_pipe.py::test_no_chunk[always] PASSED [ 58%] 2022-05-18T05:31:56.5305621Z distributed/pipeline/sync/test_pipe.py::test_no_chunk[except_last] PASSED [ 60%] 2022-05-18T05:31:56.5545092Z distributed/pipeline/sync/test_pipe.py::test_deferred_batch_norm[never] PASSED [ 62%] 2022-05-18T05:31:56.5773695Z distributed/pipeline/sync/test_pipe.py::test_deferred_batch_norm[always] PASSED [ 64%] 2022-05-18T05:31:56.5998479Z distributed/pipeline/sync/test_pipe.py::test_deferred_batch_norm[except_last] PASSED [ 66%] 2022-05-18T05:31:56.6203054Z distributed/pipeline/sync/test_pipe.py::test_deferred_batch_norm_params[never] PASSED [ 67%] 2022-05-18T05:31:56.6417743Z distributed/pipeline/sync/test_pipe.py::test_deferred_batch_norm_params[always] PASSED [ 69%] 2022-05-18T05:31:56.6577330Z distributed/pipeline/sync/test_pipe.py::test_devices PASSED [ 71%] 2022-05-18T05:31:56.6734460Z distributed/pipeline/sync/test_pipe.py::test_partitions PASSED [ 73%] 2022-05-18T05:31:57.9457577Z distributed/pipeline/sync/test_pipe.py::test_merged_partitions PASSED [ 75%] 2022-05-18T05:31:57.9620295Z distributed/pipeline/sync/test_pipe.py::test_deny_moving PASSED [ 76%] 2022-05-18T05:31:57.9775387Z distributed/pipeline/sync/test_pipe.py::test_empty_module PASSED [ 78%] 2022-05-18T05:31:57.9932977Z distributed/pipeline/sync/test_pipe.py::test_named_children PASSED [ 80%] 2022-05-18T05:31:58.0084529Z distributed/pipeline/sync/test_pipe.py::test_verify_module_non_sequential PASSED [ 82%] 2022-05-18T05:31:58.0239245Z distributed/pipeline/sync/test_pipe.py::test_verify_module_duplicate_children PASSED [ 83%] 2022-05-18T05:31:58.0398265Z distributed/pipeline/sync/test_pipe.py::test_verify_module_params_on_same_device PASSED [ 85%] 2022-05-18T05:31:59.5669028Z distributed/pipeline/sync/test_pipe.py::test_verify_nested_modules PASSED [ 87%] 2022-05-18T05:31:59.5829075Z distributed/pipeline/sync/test_pipe.py::test_verify_module_duplicate_parameters_on_same_device PASSED [ 89%] 2022-05-18T05:31:59.9014000Z distributed/pipeline/sync/test_pipe.py::test_forward_lockstep PASSED [ 91%] 2022-05-18T05:31:59.9189091Z distributed/pipeline/sync/test_pipe.py::test_multiple_inputs[never] PASSED [ 92%] 2022-05-18T05:31:59.9380017Z distributed/pipeline/sync/test_pipe.py::test_multiple_inputs[always] PASSED [ 94%] 2022-05-18T05:31:59.9556177Z distributed/pipeline/sync/test_pipe.py::test_multiple_inputs[except_last] PASSED [ 96%] 2022-05-18T05:31:59.9720173Z distributed/pipeline/sync/test_pipe.py::test_inputs_wrong_device PASSED [ 98%] 2022-05-18T05:32:00.0258891Z distributed/pipeline/sync/test_pipe.py::test_with_device_wrapper PASSED [100%] 2022-05-18T05:32:00.0259197Z 2022-05-18T05:32:00.0259456Z =============================== warnings summary =============================== 2022-05-18T05:32:00.0259861Z test/distributed/pipeline/sync/test_pipe.py::test_batch_size_indivisible 2022-05-18T05:32:00.0260301Z test/distributed/pipeline/sync/test_pipe.py::test_batch_size_small 2022-05-18T05:32:00.0260934Z /opt/conda/lib/python3.7/site-packages/_pytest/python.py:192: PytestRemovedIn8Warning: Passing None has been deprecated. 2022-05-18T05:32:00.0261767Z See https://docs.pytest.org/en/latest/how-to/capture-warnings.html#additional-use-cases-of-warnings-in-tests for alternatives in common use cases. 2022-05-18T05:32:00.0262300Z result = testfunction(**testargs) 2022-05-18T05:32:00.0262491Z 2022-05-18T05:32:00.0262771Z -- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html 2022-05-18T05:32:00.0263348Z ======================== 56 passed, 2 warnings in 4.46s ======================== 2022-05-18T05:32:00.3093571Z Running distributed/pipeline/sync/test_stream ... [2022-05-18 05:32:00.308880] 2022-05-18T05:32:00.3094227Z Executing ['/opt/conda/bin/python', '-m', 'pytest', 'distributed/pipeline/sync/test_stream.py', '-v'] ... [2022-05-18 05:32:00.308983] 2022-05-18T05:32:01.5015646Z ============================= test session starts ============================== 2022-05-18T05:32:01.5016233Z platform linux -- Python 3.7.13, pytest-7.1.2, pluggy-1.0.0 -- /opt/conda/bin/python 2022-05-18T05:32:01.5037977Z cachedir: .pytest_cache 2022-05-18T05:32:01.5039787Z hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/var/lib/jenkins/workspace/test/.hypothesis/examples') 2022-05-18T05:32:01.5040734Z torch: 1.12.0a0+git3b23752 2022-05-18T05:32:01.5041367Z rootdir: /var/lib/jenkins/workspace, configfile: pytest.ini 2022-05-18T05:32:01.5042154Z plugins: hypothesis-4.53.2 2022-05-18T05:32:01.5449436Z collecting ...  2022-05-18T05:32:01.5449896Z collected 19 items  2022-05-18T05:32:01.5454634Z 2022-05-18T05:32:01.5486524Z distributed/pipeline/sync/test_stream.py::TestNewStream::test_new_stream_cpu PASSED [ 5%] 2022-05-18T05:32:02.8017575Z distributed/pipeline/sync/test_stream.py::TestNewStream::test_new_stream_cuda PASSED [ 10%] 2022-05-18T05:32:02.8031002Z distributed/pipeline/sync/test_stream.py::TestCurrentStream::test_current_stream_cpu PASSED [ 15%] 2022-05-18T05:32:02.8045033Z distributed/pipeline/sync/test_stream.py::TestCurrentStream::test_current_stream_cuda PASSED [ 21%] 2022-05-18T05:32:02.8058608Z distributed/pipeline/sync/test_stream.py::TestDefaultStream::test_default_stream_cpu PASSED [ 26%] 2022-05-18T05:32:02.8072643Z distributed/pipeline/sync/test_stream.py::TestDefaultStream::test_default_stream_cuda PASSED [ 31%] 2022-05-18T05:32:02.8085884Z distributed/pipeline/sync/test_stream.py::TestUseDevice::test_use_device_cpu PASSED [ 36%] 2022-05-18T05:32:02.8099689Z distributed/pipeline/sync/test_stream.py::TestUseDevice::test_use_device_cuda PASSED [ 42%] 2022-05-18T05:32:02.8112687Z distributed/pipeline/sync/test_stream.py::TestUseStream::test_use_stream_cpu PASSED [ 47%] 2022-05-18T05:32:02.8127044Z distributed/pipeline/sync/test_stream.py::TestUseStream::test_use_stream_cuda PASSED [ 52%] 2022-05-18T05:32:02.8140590Z distributed/pipeline/sync/test_stream.py::TestGetDevice::test_get_device_cpu PASSED [ 57%] 2022-05-18T05:32:02.8155030Z distributed/pipeline/sync/test_stream.py::TestGetDevice::test_get_device_cuda PASSED [ 63%] 2022-05-18T05:32:02.8355613Z distributed/pipeline/sync/test_stream.py::TestWaitStream::test_wait_stream_cpu_cpu PASSED [ 68%] 2022-05-18T05:32:03.3225405Z distributed/pipeline/sync/test_stream.py::TestWaitStream::test_wait_stream_cpu_cuda PASSED [ 73%] 2022-05-18T05:32:03.3242834Z distributed/pipeline/sync/test_stream.py::TestWaitStream::test_wait_stream_cuda_cpu PASSED [ 78%] 2022-05-18T05:32:03.8097580Z distributed/pipeline/sync/test_stream.py::TestWaitStream::test_wait_stream_cuda_cuda PASSED [ 84%] 2022-05-18T05:32:03.8113374Z distributed/pipeline/sync/test_stream.py::TestRecordStream::test_record_stream_cpu PASSED [ 89%] 2022-05-18T05:32:04.2972802Z distributed/pipeline/sync/test_stream.py::TestRecordStream::test_record_stream_cuda PASSED [ 94%] 2022-05-18T05:32:04.3000496Z distributed/pipeline/sync/test_stream.py::TestRecordStream::test_record_stream_shifted_view PASSED [100%] 2022-05-18T05:32:04.3001642Z 2022-05-18T05:32:04.3002350Z ============================== 19 passed in 2.80s ============================== 2022-05-18T05:32:04.8129137Z Running distributed/pipeline/sync/test_worker ... [2022-05-18 05:32:04.812457] 2022-05-18T05:32:04.8129785Z Executing ['/opt/conda/bin/python', '-m', 'pytest', 'distributed/pipeline/sync/test_worker.py', '-v'] ... [2022-05-18 05:32:04.812558] 2022-05-18T05:32:06.0611029Z ============================= test session starts ============================== 2022-05-18T05:32:06.0611618Z platform linux -- Python 3.7.13, pytest-7.1.2, pluggy-1.0.0 -- /opt/conda/bin/python 2022-05-18T05:32:06.0631723Z cachedir: .pytest_cache 2022-05-18T05:32:06.0632323Z hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/var/lib/jenkins/workspace/test/.hypothesis/examples') 2022-05-18T05:32:06.0633085Z torch: 1.12.0a0+git3b23752 2022-05-18T05:32:06.0633456Z rootdir: /var/lib/jenkins/workspace, configfile: pytest.ini 2022-05-18T05:32:06.0633849Z plugins: hypothesis-4.53.2 2022-05-18T05:32:06.0842721Z collecting ...  2022-05-18T05:32:06.0843149Z collected 6 items  2022-05-18T05:32:06.0847162Z 2022-05-18T05:32:06.0887517Z distributed/pipeline/sync/test_worker.py::test_compute_multithreading PASSED [ 16%] 2022-05-18T05:32:06.0909856Z distributed/pipeline/sync/test_worker.py::test_compute_success PASSED [ 33%] 2022-05-18T05:32:06.0928885Z distributed/pipeline/sync/test_worker.py::test_compute_exception PASSED [ 50%] 2022-05-18T05:32:06.0959419Z distributed/pipeline/sync/test_worker.py::test_grad_mode[True] PASSED [ 66%] 2022-05-18T05:32:06.0980628Z distributed/pipeline/sync/test_worker.py::test_grad_mode[False] PASSED [ 83%] 2022-05-18T05:32:06.1008112Z distributed/pipeline/sync/test_worker.py::test_worker_per_device PASSED [100%] 2022-05-18T05:32:06.1010023Z 2022-05-18T05:32:06.1010336Z ============================== 6 passed in 0.04s =============================== 2022-05-18T05:32:06.2378822Z Running distributed/rpc/test_tensorpipe_agent ... [2022-05-18 05:32:06.237471] 2022-05-18T05:32:06.2379593Z Executing ['/opt/conda/bin/python', 'distributed/rpc/test_tensorpipe_agent.py', '-v', '--subprocess', '--import-slow-tests', '--import-disabled-tests'] ... [2022-05-18 05:32:06.237571] 2022-05-18T05:32:07.1299620Z INFO:torch.distributed.nn.jit.instantiator:Created a temporary directory at /tmp/tmpc55437jf 2022-05-18T05:32:07.1300509Z INFO:torch.distributed.nn.jit.instantiator:Writing /tmp/tmpc55437jf/_remote_module_non_scriptable.py 2022-05-18T05:32:08.3817546Z 2022-05-18T05:32:08.3817902Z real 70m55.031s 2022-05-18T05:32:08.3818193Z user 134m18.411s 2022-05-18T05:32:08.3818465Z sys 101m27.875s 2022-05-18T05:32:08.3818722Z + assert_git_not_dirty 2022-05-18T05:32:08.3819286Z + [[ linux-xenial-cuda11.3-py3.7-gcc7-distributed != *rocm* ]] 2022-05-18T05:32:08.3819811Z + [[ linux-xenial-cuda11.3-py3.7-gcc7-distributed != *xla* ]] 2022-05-18T05:32:08.3821238Z ++ git status --porcelain 2022-05-18T05:32:09.0733570Z + git_status= 2022-05-18T05:32:09.0734054Z + [[ -n '' ]] 2022-05-18T05:32:09.0734510Z + [[ linux-xenial-cuda11.3-py3.7-gcc7-distributed == *cuda* ]] 2022-05-18T05:32:09.0734829Z + [[ 2 == 1 ]] 2022-05-18T05:32:09.0735093Z + [[ 2 == 1 ]] 2022-05-18T05:32:09.0735326Z + cleanup 2022-05-18T05:32:09.0735557Z + retcode=0 2022-05-18T05:32:09.0735770Z + set +x 2022-05-18T05:32:09.0736009Z EXITED_USER_LAND 2022-05-18T05:32:09.0816619Z ##[group]Run pytorch/pytorch/.github/actions/get-workflow-job-id@master 2022-05-18T05:32:09.0816967Z with: 2022-05-18T05:32:09.0817537Z github-token: *** 2022-05-18T05:32:09.0817786Z env: 2022-05-18T05:32:09.0817987Z IN_CI: 1 2022-05-18T05:32:09.0818214Z IS_GHA: 1 2022-05-18T05:32:09.0818466Z GIT_DEFAULT_BRANCH: master 2022-05-18T05:32:09.0818719Z GPU_FLAG: --gpus all 2022-05-18T05:32:09.0818968Z ##[endgroup] 2022-05-18T05:32:09.0849968Z ##[group]Run nick-fields/retry@71062288b76e2b6214ebde0e673ce0de1755740a 2022-05-18T05:32:09.0850794Z with: 2022-05-18T05:32:09.0851013Z shell: bash 2022-05-18T05:32:09.0851260Z timeout_minutes: 10 2022-05-18T05:32:09.0851512Z max_attempts: 5 2022-05-18T05:32:09.0851746Z retry_wait_seconds: 30 2022-05-18T05:32:09.0852293Z command: set -x python3 -m pip install requests==2.26.0 GHA_WORKFLOW_JOB_ID=$(python3 .github/scripts/get_workflow_job_id.py "${GITHUB_RUN_ID}" "${RUNNER_NAME}") echo "::set-output name=job-id::${GHA_WORKFLOW_JOB_ID}" 2022-05-18T05:32:09.0852938Z polling_interval_seconds: 1 2022-05-18T05:32:09.0853195Z warning_on_retry: true 2022-05-18T05:32:09.0853461Z continue_on_error: false 2022-05-18T05:32:09.0853705Z env: 2022-05-18T05:32:09.0853903Z IN_CI: 1 2022-05-18T05:32:09.0854126Z IS_GHA: 1 2022-05-18T05:32:09.0854374Z GIT_DEFAULT_BRANCH: master 2022-05-18T05:32:09.0854625Z GPU_FLAG: --gpus all 2022-05-18T05:32:09.0855025Z GITHUB_TOKEN: *** 2022-05-18T05:32:09.0855274Z ##[endgroup] 2022-05-18T05:32:09.1293101Z 2022-05-18T05:32:09.1366319Z + python3 -m pip install requests==2.26.0 2022-05-18T05:32:09.4195416Z Defaulting to user installation because normal site-packages is not writeable 2022-05-18T05:32:09.4406618Z Requirement already satisfied: requests==2.26.0 in /home/ec2-user/.local/lib/python3.7/site-packages (2.26.0) 2022-05-18T05:32:09.4587383Z Requirement already satisfied: urllib3<1.27,>=1.21.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from requests==2.26.0) (1.26.9) 2022-05-18T05:32:09.4798082Z Requirement already satisfied: charset-normalizer~=2.0.0; python_version >= "3" in /home/ec2-user/.local/lib/python3.7/site-packages (from requests==2.26.0) (2.0.12) 2022-05-18T05:32:09.4824057Z Requirement already satisfied: idna<4,>=2.5; python_version >= "3" in /home/ec2-user/.local/lib/python3.7/site-packages (from requests==2.26.0) (3.3) 2022-05-18T05:32:09.4839388Z Requirement already satisfied: certifi>=2017.4.17 in /home/ec2-user/.local/lib/python3.7/site-packages (from requests==2.26.0) (2021.10.8) 2022-05-18T05:32:09.5922951Z ++ python3 .github/scripts/get_workflow_job_id.py 2342799944 i-0f05d6101f258be9b 2022-05-18T05:32:11.3003724Z + GHA_WORKFLOW_JOB_ID=6482805675 2022-05-18T05:32:11.3004761Z + echo '::set-output name=job-id::6482805675' 2022-05-18T05:32:12.1375124Z Command completed after 1 attempt(s). 2022-05-18T05:32:12.1375422Z 2022-05-18T05:32:12.1530786Z Prepare all required actions 2022-05-18T05:32:12.1531302Z Getting action download info 2022-05-18T05:32:12.3528092Z Download action repository 'actions/upload-artifact@v2' (SHA:82c141cc518b40d92cc801eee768e7aafc9c2fa2) 2022-05-18T05:32:12.4739019Z ##[group]Run ./.github/actions/upload-test-artifacts 2022-05-18T05:32:12.4739317Z with: 2022-05-18T05:32:12.4739694Z file-suffix: test-distributed-2-2-linux.8xlarge.nvidia.gpu_6482805675 2022-05-18T05:32:12.4740054Z env: 2022-05-18T05:32:12.4740266Z IN_CI: 1 2022-05-18T05:32:12.4740510Z IS_GHA: 1 2022-05-18T05:32:12.4740776Z GIT_DEFAULT_BRANCH: master 2022-05-18T05:32:12.4741036Z GPU_FLAG: --gpus all 2022-05-18T05:32:12.4741299Z ##[endgroup] 2022-05-18T05:32:12.4771724Z ##[group]Run # Remove any previous test jsons if they exist 2022-05-18T05:32:12.4772121Z # Remove any previous test jsons if they exist 2022-05-18T05:32:12.4772430Z rm -f test-jsons-*.zip 2022-05-18T05:32:12.4772850Z zip -r "test-jsons-${FILE_SUFFIX}.zip" test -i '*.json' 2022-05-18T05:32:12.4785650Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2022-05-18T05:32:12.4785965Z env: 2022-05-18T05:32:12.4786201Z IN_CI: 1 2022-05-18T05:32:12.4786420Z IS_GHA: 1 2022-05-18T05:32:12.4786680Z GIT_DEFAULT_BRANCH: master 2022-05-18T05:32:12.4786963Z GPU_FLAG: --gpus all 2022-05-18T05:32:12.4787326Z FILE_SUFFIX: test-distributed-2-2-linux.8xlarge.nvidia.gpu_6482805675 2022-05-18T05:32:12.4787692Z ##[endgroup] 2022-05-18T05:32:12.4908090Z adding: test/allowlist_for_publicAPI.json (deflated 82%) 2022-05-18T05:32:12.4943171Z adding: test/benchmark_utils/callgrind_artifacts.json (deflated 92%) 2022-05-18T05:32:12.4944226Z adding: test/.pytorch-slow-tests.json (deflated 71%) 2022-05-18T05:32:12.4948466Z adding: test/.pytorch-disabled-tests.json (deflated 83%) 2022-05-18T05:32:12.4972785Z ##[group]Run # Remove any previous test reports if they exist 2022-05-18T05:32:12.4973203Z # Remove any previous test reports if they exist 2022-05-18T05:32:12.4973722Z rm -f test-reports-*.zip 2022-05-18T05:32:12.4974060Z zip -r "test-reports-${FILE_SUFFIX}.zip" test -i '*.xml' 2022-05-18T05:32:12.4986184Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2022-05-18T05:32:12.4986495Z env: 2022-05-18T05:32:12.4986709Z IN_CI: 1 2022-05-18T05:32:12.4986944Z IS_GHA: 1 2022-05-18T05:32:12.4987210Z GIT_DEFAULT_BRANCH: master 2022-05-18T05:32:12.4987466Z GPU_FLAG: --gpus all 2022-05-18T05:32:12.4987849Z FILE_SUFFIX: test-distributed-2-2-linux.8xlarge.nvidia.gpu_6482805675 2022-05-18T05:32:12.4988216Z ##[endgroup] 2022-05-18T05:32:12.5105793Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeCudaDdpComparisonTest-20220518042121.xml (deflated 42%) 2022-05-18T05:32:12.5106773Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeCudaDistAutogradTest-20220518042126.xml (deflated 41%) 2022-05-18T05:32:12.5107715Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeCudaDistAutogradTest-20220518042133.xml (deflated 42%) 2022-05-18T05:32:12.5108610Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeCudaDistAutogradTest-20220518042140.xml (deflated 42%) 2022-05-18T05:32:12.5109493Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeCudaRemoteModuleTest-20220518042147.xml (deflated 41%) 2022-05-18T05:32:12.5110401Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeCudaRemoteModuleTest-20220518042153.xml (deflated 41%) 2022-05-18T05:32:12.5111302Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeCudaRemoteModuleTest-20220518042159.xml (deflated 41%) 2022-05-18T05:32:12.5112335Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeCudaRemoteModuleTest-20220518042204.xml (deflated 41%) 2022-05-18T05:32:12.5113243Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeCudaRpcTest-20220518042210.xml (deflated 41%) 2022-05-18T05:32:12.5114101Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipePipeWithDDPTest-20220518042219.xml (deflated 41%) 2022-05-18T05:32:12.5114977Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipePipeWithDDPTest-20220518042224.xml (deflated 41%) 2022-05-18T05:32:12.5115848Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipePipeWithDDPTest-20220518042229.xml (deflated 41%) 2022-05-18T05:32:12.5116727Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipePipeWithDDPTest-20220518042233.xml (deflated 41%) 2022-05-18T05:32:12.5117582Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipePipeWithDDPTest-20220518042238.xml (deflated 41%) 2022-05-18T05:32:12.5118448Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipePipeWithDDPTest-20220518042243.xml (deflated 41%) 2022-05-18T05:32:12.5119310Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipePipeWithDDPTest-20220518042247.xml (deflated 41%) 2022-05-18T05:32:12.5120166Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipePipeWithDDPTest-20220518042252.xml (deflated 41%) 2022-05-18T05:32:12.5121086Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518042257.xml (deflated 43%) 2022-05-18T05:32:12.5122063Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518042306.xml (deflated 43%) 2022-05-18T05:32:12.5123149Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518042318.xml (deflated 42%) 2022-05-18T05:32:12.5124116Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518042329.xml (deflated 44%) 2022-05-18T05:32:12.5125083Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518042340.xml (deflated 44%) 2022-05-18T05:32:12.5126037Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518042350.xml (deflated 44%) 2022-05-18T05:32:12.5126978Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518042400.xml (deflated 44%) 2022-05-18T05:32:12.5127943Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518042410.xml (deflated 44%) 2022-05-18T05:32:12.5128902Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518042420.xml (deflated 44%) 2022-05-18T05:32:12.5129855Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518042430.xml (deflated 44%) 2022-05-18T05:32:12.5131695Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518042435.xml (deflated 43%) 2022-05-18T05:32:12.5132714Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518042440.xml (deflated 43%) 2022-05-18T05:32:12.5133780Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518042445.xml (deflated 43%) 2022-05-18T05:32:12.5134765Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518042449.xml (deflated 43%) 2022-05-18T05:32:12.5135700Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518042456.xml (deflated 43%) 2022-05-18T05:32:12.5136647Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518042502.xml (deflated 43%) 2022-05-18T05:32:12.5137607Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518042514.xml (deflated 44%) 2022-05-18T05:32:12.5138556Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518042526.xml (deflated 43%) 2022-05-18T05:32:12.5139556Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518042544.xml (deflated 44%) 2022-05-18T05:32:12.5140519Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518042557.xml (deflated 43%) 2022-05-18T05:32:12.5141450Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518042609.xml (deflated 43%) 2022-05-18T05:32:12.5142401Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518042614.xml (deflated 43%) 2022-05-18T05:32:12.5143350Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518042622.xml (deflated 43%) 2022-05-18T05:32:12.5144401Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518042630.xml (deflated 43%) 2022-05-18T05:32:12.5145325Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518042638.xml (deflated 44%) 2022-05-18T05:32:12.5146277Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518042649.xml (deflated 43%) 2022-05-18T05:32:12.5147226Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518042659.xml (deflated 43%) 2022-05-18T05:32:12.5148189Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518042709.xml (deflated 43%) 2022-05-18T05:32:12.5149142Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518042719.xml (deflated 43%) 2022-05-18T05:32:12.5150082Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518042729.xml (deflated 43%) 2022-05-18T05:32:12.5151030Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518042740.xml (deflated 43%) 2022-05-18T05:32:12.5151988Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518042750.xml (deflated 43%) 2022-05-18T05:32:12.5152936Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518042800.xml (deflated 44%) 2022-05-18T05:32:12.5153923Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518042810.xml (deflated 43%) 2022-05-18T05:32:12.5154891Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518042821.xml (deflated 43%) 2022-05-18T05:32:12.5155845Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518042831.xml (deflated 43%) 2022-05-18T05:32:12.5156800Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518042841.xml (deflated 43%) 2022-05-18T05:32:12.5157730Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518042851.xml (deflated 43%) 2022-05-18T05:32:12.5158680Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518042901.xml (deflated 43%) 2022-05-18T05:32:12.5159642Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518042912.xml (deflated 43%) 2022-05-18T05:32:12.5160591Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518042922.xml (deflated 43%) 2022-05-18T05:32:12.5161528Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518042932.xml (deflated 42%) 2022-05-18T05:32:12.5162456Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518042940.xml (deflated 43%) 2022-05-18T05:32:12.5163406Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518042950.xml (deflated 43%) 2022-05-18T05:32:12.5164398Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518042958.xml (deflated 43%) 2022-05-18T05:32:12.5165417Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518043006.xml (deflated 43%) 2022-05-18T05:32:12.5166349Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518043016.xml (deflated 43%) 2022-05-18T05:32:12.5167303Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518043026.xml (deflated 43%) 2022-05-18T05:32:12.5168254Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518043031.xml (deflated 44%) 2022-05-18T05:32:12.5169206Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518043036.xml (deflated 43%) 2022-05-18T05:32:12.5170158Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518043041.xml (deflated 43%) 2022-05-18T05:32:12.5171968Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518043046.xml (deflated 43%) 2022-05-18T05:32:12.5172939Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518043053.xml (deflated 43%) 2022-05-18T05:32:12.5173889Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518043059.xml (deflated 43%) 2022-05-18T05:32:12.5174829Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518043106.xml (deflated 43%) 2022-05-18T05:32:12.5175935Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518043113.xml (deflated 43%) 2022-05-18T05:32:12.5176912Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518043120.xml (deflated 43%) 2022-05-18T05:32:12.5177866Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518043127.xml (deflated 42%) 2022-05-18T05:32:12.5178814Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518043134.xml (deflated 43%) 2022-05-18T05:32:12.5179746Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518043144.xml (deflated 43%) 2022-05-18T05:32:12.5180700Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518043154.xml (deflated 43%) 2022-05-18T05:32:12.5181659Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518043159.xml (deflated 43%) 2022-05-18T05:32:12.5182602Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518043209.xml (deflated 44%) 2022-05-18T05:32:12.5183734Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518043214.xml (deflated 44%) 2022-05-18T05:32:12.5184657Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518043219.xml (deflated 43%) 2022-05-18T05:32:12.5185608Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518043223.xml (deflated 43%) 2022-05-18T05:32:12.5186653Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518043231.xml (deflated 43%) 2022-05-18T05:32:12.5187601Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518043236.xml (deflated 43%) 2022-05-18T05:32:12.5188531Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518043241.xml (deflated 44%) 2022-05-18T05:32:12.5189477Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518043249.xml (deflated 42%) 2022-05-18T05:32:12.5190432Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518043259.xml (deflated 42%) 2022-05-18T05:32:12.5191384Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518043310.xml (deflated 42%) 2022-05-18T05:32:12.5192338Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518043321.xml (deflated 42%) 2022-05-18T05:32:12.5193262Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518043330.xml (deflated 43%) 2022-05-18T05:32:12.5194209Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518043348.xml (deflated 42%) 2022-05-18T05:32:12.5195160Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518043408.xml (deflated 43%) 2022-05-18T05:32:12.5196151Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518043426.xml (deflated 43%) 2022-05-18T05:32:12.5197097Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518043446.xml (deflated 43%) 2022-05-18T05:32:12.5198045Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518043504.xml (deflated 43%) 2022-05-18T05:32:12.5198998Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518043521.xml (deflated 42%) 2022-05-18T05:32:12.5199938Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518043538.xml (deflated 42%) 2022-05-18T05:32:12.5200891Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518043555.xml (deflated 42%) 2022-05-18T05:32:12.5201821Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518043611.xml (deflated 42%) 2022-05-18T05:32:12.5202772Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518043629.xml (deflated 42%) 2022-05-18T05:32:12.5203723Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518043649.xml (deflated 42%) 2022-05-18T05:32:12.5204671Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518043707.xml (deflated 42%) 2022-05-18T05:32:12.5205600Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518043728.xml (deflated 43%) 2022-05-18T05:32:12.5206622Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeAgentCudaRpcTest-20220518043736.xml (deflated 43%) 2022-05-18T05:32:12.5207597Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeCudaDistAutogradTest-20220518043746.xml (deflated 44%) 2022-05-18T05:32:12.5208587Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeCudaDistAutogradTest-20220518043751.xml (deflated 44%) 2022-05-18T05:32:12.5209573Z adding: test/test-reports/python-unittest/distributed.rpc.cuda.test_tensorpipe_agent/TEST-TensorPipeTensorPipeCudaDistAutogradTest-20220518043756.xml (deflated 44%) 2022-05-18T05:32:12.5210767Z adding: test/test-reports/python-unittest/distributed.fsdp.test_fsdp_core/TEST-TestHooks-20220518043800.xml (deflated 79%) 2022-05-18T05:32:12.5231881Z adding: test/test-reports/python-unittest/distributed.fsdp.test_fsdp_core/TEST-TestNoGrad-20220518043800.xml (deflated 55%) 2022-05-18T05:32:12.5232653Z adding: test/test-reports/python-unittest/distributed.fsdp.test_fsdp_core/TEST-TestParamInit-20220518043800.xml (deflated 55%) 2022-05-18T05:32:12.5233423Z adding: test/test-reports/python-unittest/distributed.fsdp.test_fsdp_core/TEST-TestParityWithDDP-20220518043800.xml (deflated 95%) 2022-05-18T05:32:12.5234125Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CommTest-20220518045023.xml (deflated 39%) 2022-05-18T05:32:12.5234792Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CommTest-20220518045029.xml (deflated 39%) 2022-05-18T05:32:12.5235464Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CommTest-20220518045036.xml (deflated 40%) 2022-05-18T05:32:12.5236131Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CommTest-20220518045040.xml (deflated 39%) 2022-05-18T05:32:12.5236926Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CommTest-20220518045045.xml (deflated 38%) 2022-05-18T05:32:12.5237616Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CommTest-20220518045049.xml (deflated 40%) 2022-05-18T05:32:12.5238278Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CommTest-20220518045053.xml (deflated 40%) 2022-05-18T05:32:12.5238916Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CommTest-20220518045057.xml (deflated 40%) 2022-05-18T05:32:12.5239580Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CommTest-20220518045101.xml (deflated 38%) 2022-05-18T05:32:12.5240241Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CommTest-20220518045107.xml (deflated 38%) 2022-05-18T05:32:12.5240907Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CommTest-20220518045112.xml (deflated 38%) 2022-05-18T05:32:12.5241554Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CommTest-20220518045118.xml (deflated 39%) 2022-05-18T05:32:12.5242219Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CommTest-20220518045124.xml (deflated 38%) 2022-05-18T05:32:12.5242880Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CommTest-20220518045130.xml (deflated 40%) 2022-05-18T05:32:12.5243541Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CommTest-20220518045134.xml (deflated 38%) 2022-05-18T05:32:12.5244177Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-CommTest-20220518045139.xml (deflated 38%) 2022-05-18T05:32:12.5244925Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045144.xml (deflated 42%) 2022-05-18T05:32:12.5245742Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045151.xml (deflated 42%) 2022-05-18T05:32:12.5246635Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045158.xml (deflated 41%) 2022-05-18T05:32:12.5247425Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045205.xml (deflated 42%) 2022-05-18T05:32:12.5248241Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045211.xml (deflated 41%) 2022-05-18T05:32:12.5249041Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045218.xml (deflated 42%) 2022-05-18T05:32:12.5249847Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045224.xml (deflated 42%) 2022-05-18T05:32:12.5251367Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045231.xml (deflated 41%) 2022-05-18T05:32:12.5252334Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045237.xml (deflated 45%) 2022-05-18T05:32:12.5253136Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045243.xml (deflated 45%) 2022-05-18T05:32:12.5253917Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045249.xml (deflated 43%) 2022-05-18T05:32:12.5254717Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045255.xml (deflated 43%) 2022-05-18T05:32:12.5255507Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045301.xml (deflated 46%) 2022-05-18T05:32:12.5256296Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045307.xml (deflated 46%) 2022-05-18T05:32:12.5257165Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045313.xml (deflated 47%) 2022-05-18T05:32:12.5257988Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045318.xml (deflated 47%) 2022-05-18T05:32:12.5258790Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045324.xml (deflated 45%) 2022-05-18T05:32:12.5259590Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045330.xml (deflated 45%) 2022-05-18T05:32:12.5260359Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045336.xml (deflated 46%) 2022-05-18T05:32:12.5261161Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045342.xml (deflated 44%) 2022-05-18T05:32:12.5261956Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045348.xml (deflated 44%) 2022-05-18T05:32:12.5262750Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045354.xml (deflated 42%) 2022-05-18T05:32:12.5263549Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045400.xml (deflated 42%) 2022-05-18T05:32:12.5264351Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045407.xml (deflated 42%) 2022-05-18T05:32:12.5265141Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045413.xml (deflated 45%) 2022-05-18T05:32:12.5265929Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045420.xml (deflated 44%) 2022-05-18T05:32:12.5266724Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045426.xml (deflated 42%) 2022-05-18T05:32:12.5267617Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045430.xml (deflated 41%) 2022-05-18T05:32:12.5268406Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045437.xml (deflated 41%) 2022-05-18T05:32:12.5269195Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045441.xml (deflated 42%) 2022-05-18T05:32:12.5269978Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045447.xml (deflated 42%) 2022-05-18T05:32:12.5270739Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045454.xml (deflated 43%) 2022-05-18T05:32:12.5271509Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045501.xml (deflated 41%) 2022-05-18T05:32:12.5272304Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045506.xml (deflated 41%) 2022-05-18T05:32:12.5273098Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045512.xml (deflated 41%) 2022-05-18T05:32:12.5273862Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045518.xml (deflated 41%) 2022-05-18T05:32:12.5274645Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045524.xml (deflated 41%) 2022-05-18T05:32:12.5275427Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045530.xml (deflated 41%) 2022-05-18T05:32:12.5276264Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045535.xml (deflated 42%) 2022-05-18T05:32:12.5277059Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045542.xml (deflated 41%) 2022-05-18T05:32:12.5277854Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045549.xml (deflated 42%) 2022-05-18T05:32:12.5278642Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045555.xml (deflated 41%) 2022-05-18T05:32:12.5279435Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045602.xml (deflated 41%) 2022-05-18T05:32:12.5280211Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045611.xml (deflated 43%) 2022-05-18T05:32:12.5281010Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045615.xml (deflated 41%) 2022-05-18T05:32:12.5281808Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045619.xml (deflated 42%) 2022-05-18T05:32:12.5282603Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045625.xml (deflated 42%) 2022-05-18T05:32:12.5283379Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045632.xml (deflated 41%) 2022-05-18T05:32:12.5284365Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045639.xml (deflated 42%) 2022-05-18T05:32:12.5285146Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045646.xml (deflated 42%) 2022-05-18T05:32:12.5285939Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045650.xml (deflated 42%) 2022-05-18T05:32:12.5286784Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045654.xml (deflated 41%) 2022-05-18T05:32:12.5287570Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045659.xml (deflated 43%) 2022-05-18T05:32:12.5288356Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045703.xml (deflated 42%) 2022-05-18T05:32:12.5289148Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045710.xml (deflated 42%) 2022-05-18T05:32:12.5289922Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045717.xml (deflated 42%) 2022-05-18T05:32:12.5291395Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045739.xml (deflated 44%) 2022-05-18T05:32:12.5292263Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045746.xml (deflated 42%) 2022-05-18T05:32:12.5293064Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045752.xml (deflated 42%) 2022-05-18T05:32:12.5293847Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045756.xml (deflated 41%) 2022-05-18T05:32:12.5294627Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045803.xml (deflated 41%) 2022-05-18T05:32:12.5295414Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045809.xml (deflated 41%) 2022-05-18T05:32:12.5296211Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-DistributedDataParallelTest-20220518045817.xml (deflated 41%) 2022-05-18T05:32:12.5297079Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-NcclErrorHandlingTest-20220518045824.xml (deflated 42%) 2022-05-18T05:32:12.5297844Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-NcclErrorHandlingTest-20220518045828.xml (deflated 41%) 2022-05-18T05:32:12.5298586Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-NcclErrorHandlingTest-20220518045832.xml (deflated 42%) 2022-05-18T05:32:12.5299344Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-NcclErrorHandlingTest-20220518045833.xml (deflated 42%) 2022-05-18T05:32:12.5300087Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-NcclErrorHandlingTest-20220518045837.xml (deflated 41%) 2022-05-18T05:32:12.5300811Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-NcclErrorHandlingTest-20220518045841.xml (deflated 41%) 2022-05-18T05:32:12.5301574Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-NcclErrorHandlingTest-20220518045845.xml (deflated 41%) 2022-05-18T05:32:12.5302329Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-NcclErrorHandlingTest-20220518045849.xml (deflated 43%) 2022-05-18T05:32:12.5303076Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-NcclErrorHandlingTest-20220518045850.xml (deflated 43%) 2022-05-18T05:32:12.5303840Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLNoGPUTest-20220518045854.xml (deflated 42%) 2022-05-18T05:32:12.5304618Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLTest-20220518045855.xml (deflated 39%) 2022-05-18T05:32:12.5305366Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLTest-20220518045901.xml (deflated 39%) 2022-05-18T05:32:12.5306117Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLTest-20220518045907.xml (deflated 39%) 2022-05-18T05:32:12.5306931Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLTest-20220518045914.xml (deflated 39%) 2022-05-18T05:32:12.5307676Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLTest-20220518045920.xml (deflated 39%) 2022-05-18T05:32:12.5308416Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLTest-20220518045927.xml (deflated 39%) 2022-05-18T05:32:12.5309146Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLTest-20220518045933.xml (deflated 38%) 2022-05-18T05:32:12.5309876Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLTest-20220518045940.xml (deflated 39%) 2022-05-18T05:32:12.5310620Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLTest-20220518045945.xml (deflated 39%) 2022-05-18T05:32:12.5311362Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLTest-20220518045952.xml (deflated 38%) 2022-05-18T05:32:12.5312111Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLTest-20220518050000.xml (deflated 39%) 2022-05-18T05:32:12.5312834Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLTest-20220518050007.xml (deflated 39%) 2022-05-18T05:32:12.5313573Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLTest-20220518050012.xml (deflated 39%) 2022-05-18T05:32:12.5314319Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLTest-20220518050019.xml (deflated 39%) 2022-05-18T05:32:12.5315062Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLTest-20220518050025.xml (deflated 39%) 2022-05-18T05:32:12.5315835Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLTest-20220518050031.xml (deflated 39%) 2022-05-18T05:32:12.5316590Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-ProcessGroupNCCLTest-20220518050037.xml (deflated 39%) 2022-05-18T05:32:12.5317323Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-RendezvousEnvTest-20220518050046.xml (deflated 40%) 2022-05-18T05:32:12.5318062Z adding: test/test-reports/python-unittest/distributed.test_c10d_nccl/TEST-TimeoutTest-20220518050049.xml (deflated 40%) 2022-05-18T05:32:12.5318846Z adding: test/test-reports/python-unittest/distributed.fsdp.test_fsdp_mixed_precision/TEST-TestFSDPMixedPrecisionSharded-20220518050056.xml (deflated 94%) 2022-05-18T05:32:12.5319745Z adding: test/test-reports/python-unittest/distributed.fsdp.test_fsdp_mixed_precision/TEST-TestFSDPMixedPrecisionUnsharded-20220518050056.xml (deflated 57%) 2022-05-18T05:32:12.5320527Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CommTest-20220518050738.xml (deflated 39%) 2022-05-18T05:32:12.5321222Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CommTest-20220518050742.xml (deflated 38%) 2022-05-18T05:32:12.5321871Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CommTest-20220518050747.xml (deflated 38%) 2022-05-18T05:32:12.5322538Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CommTest-20220518050751.xml (deflated 38%) 2022-05-18T05:32:12.5323197Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CommTest-20220518050757.xml (deflated 38%) 2022-05-18T05:32:12.5323850Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CommTest-20220518050802.xml (deflated 40%) 2022-05-18T05:32:12.5324485Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CommTest-20220518050806.xml (deflated 38%) 2022-05-18T05:32:12.5325135Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-CommTest-20220518050810.xml (deflated 39%) 2022-05-18T05:32:12.5325942Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518050814.xml (deflated 45%) 2022-05-18T05:32:12.5326744Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518050820.xml (deflated 45%) 2022-05-18T05:32:12.5327536Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518050826.xml (deflated 43%) 2022-05-18T05:32:12.5328337Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518050832.xml (deflated 43%) 2022-05-18T05:32:12.5329140Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518050838.xml (deflated 46%) 2022-05-18T05:32:12.5329943Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518050844.xml (deflated 46%) 2022-05-18T05:32:12.5331072Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518050850.xml (deflated 47%) 2022-05-18T05:32:12.5331866Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518050856.xml (deflated 47%) 2022-05-18T05:32:12.5332662Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518050902.xml (deflated 45%) 2022-05-18T05:32:12.5333462Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518050907.xml (deflated 46%) 2022-05-18T05:32:12.5334240Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518050913.xml (deflated 46%) 2022-05-18T05:32:12.5335032Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518050919.xml (deflated 44%) 2022-05-18T05:32:12.5335896Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518050925.xml (deflated 44%) 2022-05-18T05:32:12.5336709Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518050931.xml (deflated 43%) 2022-05-18T05:32:12.5337487Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518050935.xml (deflated 44%) 2022-05-18T05:32:12.5338280Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518050940.xml (deflated 45%) 2022-05-18T05:32:12.5339066Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518050944.xml (deflated 44%) 2022-05-18T05:32:12.5339855Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518050948.xml (deflated 45%) 2022-05-18T05:32:12.5340630Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518050952.xml (deflated 45%) 2022-05-18T05:32:12.5341427Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518050956.xml (deflated 50%) 2022-05-18T05:32:12.5342210Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518051002.xml (deflated 42%) 2022-05-18T05:32:12.5343001Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518051007.xml (deflated 42%) 2022-05-18T05:32:12.5343769Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518051013.xml (deflated 41%) 2022-05-18T05:32:12.5344565Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518051018.xml (deflated 42%) 2022-05-18T05:32:12.5345365Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518051024.xml (deflated 42%) 2022-05-18T05:32:12.5346237Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518051030.xml (deflated 42%) 2022-05-18T05:32:12.5347033Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518051034.xml (deflated 42%) 2022-05-18T05:32:12.5347806Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518051038.xml (deflated 41%) 2022-05-18T05:32:12.5348589Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518051042.xml (deflated 41%) 2022-05-18T05:32:12.5349384Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518051046.xml (deflated 45%) 2022-05-18T05:32:12.5350177Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518051050.xml (deflated 46%) 2022-05-18T05:32:12.5350956Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518051054.xml (deflated 41%) 2022-05-18T05:32:12.5351737Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518051058.xml (deflated 41%) 2022-05-18T05:32:12.5352524Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518051104.xml (deflated 42%) 2022-05-18T05:32:12.5353312Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518051108.xml (deflated 42%) 2022-05-18T05:32:12.5354088Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518051112.xml (deflated 41%) 2022-05-18T05:32:12.5354933Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-DistributedDataParallelTest-20220518051119.xml (deflated 41%) 2022-05-18T05:32:12.5355725Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518051125.xml (deflated 40%) 2022-05-18T05:32:12.5356482Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518051129.xml (deflated 40%) 2022-05-18T05:32:12.5357212Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518051135.xml (deflated 39%) 2022-05-18T05:32:12.5357964Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518051140.xml (deflated 40%) 2022-05-18T05:32:12.5358715Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518051144.xml (deflated 39%) 2022-05-18T05:32:12.5359461Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518051148.xml (deflated 40%) 2022-05-18T05:32:12.5360198Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518051152.xml (deflated 40%) 2022-05-18T05:32:12.5360947Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518051157.xml (deflated 39%) 2022-05-18T05:32:12.5361693Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518051204.xml (deflated 40%) 2022-05-18T05:32:12.5362432Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518051208.xml (deflated 40%) 2022-05-18T05:32:12.5363155Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518051214.xml (deflated 39%) 2022-05-18T05:32:12.5363898Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518051220.xml (deflated 40%) 2022-05-18T05:32:12.5364669Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518051224.xml (deflated 40%) 2022-05-18T05:32:12.5365486Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518051228.xml (deflated 40%) 2022-05-18T05:32:12.5366211Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518051232.xml (deflated 40%) 2022-05-18T05:32:12.5366953Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518051236.xml (deflated 39%) 2022-05-18T05:32:12.5367696Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518051241.xml (deflated 40%) 2022-05-18T05:32:12.5368438Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518051246.xml (deflated 40%) 2022-05-18T05:32:12.5369161Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518051251.xml (deflated 40%) 2022-05-18T05:32:12.5369914Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518051255.xml (deflated 40%) 2022-05-18T05:32:12.5371108Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518051301.xml (deflated 40%) 2022-05-18T05:32:12.5371857Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518051306.xml (deflated 40%) 2022-05-18T05:32:12.5372584Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518051310.xml (deflated 41%) 2022-05-18T05:32:12.5373325Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518051316.xml (deflated 40%) 2022-05-18T05:32:12.5374070Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518051320.xml (deflated 40%) 2022-05-18T05:32:12.5374895Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518051324.xml (deflated 40%) 2022-05-18T05:32:12.5375640Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518051330.xml (deflated 40%) 2022-05-18T05:32:12.5376384Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518051334.xml (deflated 39%) 2022-05-18T05:32:12.5377124Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518051339.xml (deflated 39%) 2022-05-18T05:32:12.5377863Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518051344.xml (deflated 39%) 2022-05-18T05:32:12.5378585Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518051348.xml (deflated 39%) 2022-05-18T05:32:12.5379330Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518051352.xml (deflated 40%) 2022-05-18T05:32:12.5380125Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518051357.xml (deflated 39%) 2022-05-18T05:32:12.5380866Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518051405.xml (deflated 39%) 2022-05-18T05:32:12.5381611Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518051409.xml (deflated 40%) 2022-05-18T05:32:12.5382332Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518051413.xml (deflated 40%) 2022-05-18T05:32:12.5383070Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518051419.xml (deflated 39%) 2022-05-18T05:32:12.5383807Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518051423.xml (deflated 40%) 2022-05-18T05:32:12.5384642Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518051428.xml (deflated 40%) 2022-05-18T05:32:12.5385367Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518051434.xml (deflated 40%) 2022-05-18T05:32:12.5386108Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518051438.xml (deflated 40%) 2022-05-18T05:32:12.5386847Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518051443.xml (deflated 40%) 2022-05-18T05:32:12.5387592Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518051447.xml (deflated 40%) 2022-05-18T05:32:12.5388314Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518051453.xml (deflated 39%) 2022-05-18T05:32:12.5389057Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518051457.xml (deflated 40%) 2022-05-18T05:32:12.5389800Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518051501.xml (deflated 42%) 2022-05-18T05:32:12.5390539Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518051502.xml (deflated 40%) 2022-05-18T05:32:12.5391255Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518051507.xml (deflated 42%) 2022-05-18T05:32:12.5391996Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518051508.xml (deflated 40%) 2022-05-18T05:32:12.5392738Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ProcessGroupGlooTest-20220518051514.xml (deflated 40%) 2022-05-18T05:32:12.5393452Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ReducerTest-20220518051518.xml (deflated 39%) 2022-05-18T05:32:12.5394179Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ReducerTest-20220518051519.xml (deflated 39%) 2022-05-18T05:32:12.5394878Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ReducerTest-20220518051520.xml (deflated 39%) 2022-05-18T05:32:12.5395555Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ReducerTest-20220518051521.xml (deflated 39%) 2022-05-18T05:32:12.5396237Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ReducerTest-20220518051522.xml (deflated 39%) 2022-05-18T05:32:12.5396900Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-ReducerTest-20220518051523.xml (deflated 40%) 2022-05-18T05:32:12.5397602Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-RendezvousEnvTest-20220518051524.xml (deflated 39%) 2022-05-18T05:32:12.5398304Z adding: test/test-reports/python-unittest/distributed.test_c10d_gloo/TEST-TimeoutTest-20220518051527.xml (deflated 41%) 2022-05-18T05:32:12.5399073Z adding: test/test-reports/python-unittest/distributed.fsdp.test_fsdp_summon_full_params/TEST-TestSummonFullParams-20220518051530.xml (deflated 93%) 2022-05-18T05:32:12.5399917Z adding: test/test-reports/python-unittest/distributed.fsdp.test_fsdp_summon_full_params/TEST-TestSummonFullParamsNoShard-20220518051530.xml (deflated 85%) 2022-05-18T05:32:12.5400745Z adding: test/test-reports/python-unittest/distributed.fsdp.test_fsdp_state_dict/TEST-TestFSDPStateDict-20220518051849.xml (deflated 94%) 2022-05-18T05:32:12.5401593Z adding: test/test-reports/python-unittest/distributed._shard.sharded_tensor.test_sharded_tensor/TEST-TestCreateTensorFromParams-20220518052130.xml (deflated 42%) 2022-05-18T05:32:12.5402474Z adding: test/test-reports/python-unittest/distributed._shard.sharded_tensor.test_sharded_tensor/TEST-TestShardedTensorMetadata-20220518052130.xml (deflated 44%) 2022-05-18T05:32:12.5403295Z adding: test/test-reports/python-unittest/distributed._shard.sharded_tensor.test_sharded_tensor/TEST-TestLocalTensor-20220518052130.xml (deflated 59%) 2022-05-18T05:32:12.5404182Z adding: test/test-reports/python-unittest/distributed._shard.sharded_tensor.test_sharded_tensor/TEST-TestModuleHookApi-20220518052130.xml (deflated 58%) 2022-05-18T05:32:12.5405000Z adding: test/test-reports/python-unittest/distributed._shard.sharded_tensor.test_sharded_tensor/TEST-TestShardParameter-20220518052130.xml (deflated 60%) 2022-05-18T05:32:12.5405843Z adding: test/test-reports/python-unittest/distributed._shard.sharded_tensor.test_sharded_tensor/TEST-TestShardTensor-20220518052130.xml (deflated 60%) 2022-05-18T05:32:12.5406680Z adding: test/test-reports/python-unittest/distributed._shard.sharded_tensor.test_sharded_tensor/TEST-TestShardedTensorChunked-20220518052130.xml (deflated 88%) 2022-05-18T05:32:12.5407552Z adding: test/test-reports/python-unittest/distributed._shard.sharded_tensor.test_sharded_tensor/TEST-TestShardedTensorCustomOps-20220518052130.xml (deflated 69%) 2022-05-18T05:32:12.5408449Z adding: test/test-reports/python-unittest/distributed._shard.sharded_tensor.test_sharded_tensor/TEST-TestShardedTensorEnumerable-20220518052130.xml (deflated 85%) 2022-05-18T05:32:12.5409362Z adding: test/test-reports/python-unittest/distributed._shard.sharded_tensor.test_sharded_tensor/TEST-TestShardedTensorFromLocalShards-20220518052130.xml (deflated 85%) 2022-05-18T05:32:12.5410478Z adding: test/test-reports/python-unittest/distributed._shard.sharded_tensor.test_sharded_tensor/TEST-TestShardedTensorFromLocalTensor-20220518052130.xml (deflated 61%) 2022-05-18T05:32:12.5411412Z adding: test/test-reports/python-unittest/distributed.test_c10d_spawn_gloo/TEST-DistributedDataParallelSingleProcessTest-20220518052246.xml (deflated 44%) 2022-05-18T05:32:12.5412350Z adding: test/test-reports/python-unittest/distributed.test_c10d_spawn_gloo/TEST-DistributedDataParallelSingleProcessTest-20220518052249.xml (deflated 44%) 2022-05-18T05:32:12.5413364Z adding: test/test-reports/python-unittest/distributed.test_c10d_spawn_gloo/TEST-DistributedDataParallelSingleProcessTest-20220518052252.xml (deflated 44%) 2022-05-18T05:32:12.5414255Z adding: test/test-reports/python-unittest/distributed.test_c10d_spawn_gloo/TEST-ProcessGroupShareTensorTest-20220518052256.xml (deflated 41%) 2022-05-18T05:32:12.5415074Z adding: test/test-reports/python-unittest/distributed.test_c10d_spawn_gloo/TEST-ProcessGroupShareTensorTest-20220518052302.xml (deflated 42%) 2022-05-18T05:32:12.5415902Z adding: test/test-reports/python-unittest/distributed.test_c10d_spawn_gloo/TEST-ProcessGroupShareTensorTest-20220518052310.xml (deflated 41%) 2022-05-18T05:32:12.5416727Z adding: test/test-reports/python-unittest/distributed.test_c10d_spawn_gloo/TEST-ProcessGroupShareTensorTest-20220518052318.xml (deflated 41%) 2022-05-18T05:32:12.5417567Z adding: test/test-reports/python-unittest/distributed.test_c10d_spawn_gloo/TEST-TestDistributedNNFunctionsGloo-20220518052326.xml (deflated 42%) 2022-05-18T05:32:12.5418424Z adding: test/test-reports/python-unittest/distributed.test_c10d_spawn_gloo/TEST-TestDistributedNNFunctionsGloo-20220518052333.xml (deflated 42%) 2022-05-18T05:32:12.5419253Z adding: test/test-reports/python-unittest/distributed.test_c10d_spawn_gloo/TEST-TestDistributedNNFunctionsGloo-20220518052340.xml (deflated 42%) 2022-05-18T05:32:12.5420100Z adding: test/test-reports/python-unittest/distributed.test_c10d_spawn_gloo/TEST-TestDistributedNNFunctionsGloo-20220518052347.xml (deflated 42%) 2022-05-18T05:32:12.5420945Z adding: test/test-reports/python-unittest/distributed.test_c10d_spawn_gloo/TEST-TestDistributedNNFunctionsGloo-20220518052354.xml (deflated 42%) 2022-05-18T05:32:12.5421778Z adding: test/test-reports/python-unittest/distributed.test_c10d_spawn_gloo/TEST-TestDistributedNNFunctionsGloo-20220518052401.xml (deflated 42%) 2022-05-18T05:32:12.5422606Z adding: test/test-reports/python-unittest/distributed.test_c10d_spawn_gloo/TEST-TestDistributedNNFunctionsGloo-20220518052408.xml (deflated 42%) 2022-05-18T05:32:12.5423533Z adding: test/test-reports/python-unittest/distributed.test_c10d_spawn_gloo/TEST-TestDistributedNNFunctionsGloo-20220518052415.xml (deflated 42%) 2022-05-18T05:32:12.5424372Z adding: test/test-reports/python-unittest/distributed.test_c10d_spawn_nccl/TEST-ProcessGroupShareTensorTest-20220518052424.xml (deflated 41%) 2022-05-18T05:32:12.5425199Z adding: test/test-reports/python-unittest/distributed.test_c10d_spawn_nccl/TEST-ProcessGroupShareTensorTest-20220518052432.xml (deflated 41%) 2022-05-18T05:32:12.5426004Z adding: test/test-reports/python-unittest/distributed.test_c10d_spawn_nccl/TEST-ProcessGroupShareTensorTest-20220518052440.xml (deflated 41%) 2022-05-18T05:32:12.5426831Z adding: test/test-reports/python-unittest/distributed.test_c10d_spawn_nccl/TEST-ProcessGroupShareTensorTest-20220518052448.xml (deflated 41%) 2022-05-18T05:32:12.5427668Z adding: test/test-reports/python-unittest/distributed.test_c10d_spawn_nccl/TEST-TestDistributedNNFunctionsNccl-20220518052456.xml (deflated 42%) 2022-05-18T05:32:12.5428518Z adding: test/test-reports/python-unittest/distributed.test_c10d_spawn_nccl/TEST-TestDistributedNNFunctionsNccl-20220518052503.xml (deflated 42%) 2022-05-18T05:32:12.5429343Z adding: test/test-reports/python-unittest/distributed.test_c10d_spawn_nccl/TEST-TestDistributedNNFunctionsNccl-20220518052510.xml (deflated 43%) 2022-05-18T05:32:12.5430184Z adding: test/test-reports/python-unittest/distributed.test_c10d_spawn_nccl/TEST-TestDistributedNNFunctionsNccl-20220518052517.xml (deflated 42%) 2022-05-18T05:32:12.5431025Z adding: test/test-reports/python-unittest/distributed.test_c10d_spawn_nccl/TEST-TestDistributedNNFunctionsNccl-20220518052524.xml (deflated 42%) 2022-05-18T05:32:12.5431865Z adding: test/test-reports/python-unittest/distributed.test_c10d_spawn_nccl/TEST-TestDistributedNNFunctionsNccl-20220518052531.xml (deflated 42%) 2022-05-18T05:32:12.5432739Z adding: test/test-reports/python-unittest/distributed.test_c10d_spawn_nccl/TEST-TestDistributedNNFunctionsNccl-20220518052538.xml (deflated 42%) 2022-05-18T05:32:12.5433523Z adding: test/test-reports/python-unittest/distributed.fsdp.test_wrap/TEST-TestAutoWrap-20220518052544.xml (deflated 82%) 2022-05-18T05:32:12.5434229Z adding: test/test-reports/python-unittest/distributed.fsdp.test_wrap/TEST-TestFSDPWrap-20220518052544.xml (deflated 85%) 2022-05-18T05:32:12.5434933Z adding: test/test-reports/python-unittest/distributed.algorithms.test_join/TEST-TestJoin-20220518052629.xml (deflated 79%) 2022-05-18T05:32:12.5435653Z adding: test/test-reports/python-unittest/distributed.fsdp.test_fsdp_comm/TEST-TestCommunication-20220518052704.xml (deflated 91%) 2022-05-18T05:32:12.5436372Z adding: test/test-reports/python-unittest/distributed.test_c10d_common/TEST-CommTest-20220518052733.xml (deflated 38%) 2022-05-18T05:32:12.5437129Z adding: test/test-reports/python-unittest/distributed.test_c10d_common/TEST-ComputeBucketAssignmentTest-20220518052737.xml (deflated 41%) 2022-05-18T05:32:12.5437951Z adding: test/test-reports/python-unittest/distributed.test_c10d_common/TEST-ComputeBucketAssignmentTest-20220518052740.xml (deflated 40%) 2022-05-18T05:32:12.5438756Z adding: test/test-reports/python-unittest/distributed.test_c10d_common/TEST-ComputeBucketAssignmentTest-20220518052743.xml (deflated 40%) 2022-05-18T05:32:12.5439539Z adding: test/test-reports/python-unittest/distributed.test_c10d_common/TEST-ComputeBucketAssignmentTest-20220518052745.xml (deflated 42%) 2022-05-18T05:32:12.5440370Z adding: test/test-reports/python-unittest/distributed.test_c10d_common/TEST-PythonProcessGroupExtensionTest-20220518052748.xml (deflated 41%) 2022-05-18T05:32:12.5441223Z adding: test/test-reports/python-unittest/distributed.test_c10d_common/TEST-PythonProcessGroupExtensionTest-20220518052752.xml (deflated 42%) 2022-05-18T05:32:12.5442070Z adding: test/test-reports/python-unittest/distributed.test_c10d_common/TEST-PythonProcessGroupExtensionTest-20220518052759.xml (deflated 41%) 2022-05-18T05:32:12.5442971Z adding: test/test-reports/python-unittest/distributed.test_c10d_common/TEST-PythonProcessGroupExtensionTest-20220518052803.xml (deflated 41%) 2022-05-18T05:32:12.5443787Z adding: test/test-reports/python-unittest/distributed.fsdp.test_fsdp_meta/TEST-TestFSDPWithMetaDevice-20220518052809.xml (deflated 86%) 2022-05-18T05:32:12.5444541Z adding: test/test-reports/python-unittest/distributed.fsdp.test_fsdp_misc/TEST-TestFSDPMisc-20220518052832.xml (deflated 71%) 2022-05-18T05:32:12.5445367Z adding: test/test-reports/python-unittest/distributed._shard.checkpoint.test_checkpoint/TEST-TestDistributedCheckpointing-20220518052857.xml (deflated 75%) 2022-05-18T05:32:12.5446193Z adding: test/test-reports/python-unittest/distributed._shard.checkpoint.test_checkpoint/TEST-TestStorageKeys-20220518052857.xml (deflated 40%) 2022-05-18T05:32:12.5446989Z adding: test/test-reports/python-unittest/distributed.fsdp.test_fsdp_checkpoint/TEST-TestFSDPCheckpoint-20220518052916.xml (deflated 83%) 2022-05-18T05:32:12.5447868Z adding: test/test-reports/python-unittest/distributed._shard.checkpoint.test_file_system_checkpoint/TEST-TestDistributedReshardOnLoad-20220518052931.xml (deflated 63%) 2022-05-18T05:32:12.5448815Z adding: test/test-reports/python-unittest/distributed._shard.checkpoint.test_file_system_checkpoint/TEST-TestDistributedStateDictSaveLoad-20220518052931.xml (deflated 42%) 2022-05-18T05:32:12.5449835Z adding: test/test-reports/python-unittest/distributed._shard.checkpoint.test_file_system_checkpoint/TEST-TestDistributedStateDictSaveLoadWithSharedTensor-20220518052931.xml (deflated 45%) 2022-05-18T05:32:12.5450906Z adding: test/test-reports/python-unittest/distributed.fsdp.test_fsdp_apply/TEST-TestApply-20220518052944.xml (deflated 61%) 2022-05-18T05:32:12.5451675Z adding: test/test-reports/python-unittest/distributed._shard.test_partial_tensor/TEST-TestPartialTensorOps-20220518052956.xml (deflated 66%) 2022-05-18T05:32:12.5452574Z adding: test/test-reports/python-unittest/distributed._shard.test_partial_tensor/TEST-TestPartialTensorReshard-20220518052956.xml (deflated 60%) 2022-05-18T05:32:12.5453432Z adding: test/test-reports/python-unittest/distributed.fsdp.test_distributed_checkpoint/TEST-TestDistributedCheckpoint-20220518053005.xml (deflated 60%) 2022-05-18T05:32:12.5454324Z adding: test/test-reports/python-unittest/distributed._shard.sharded_tensor.ops.test_binary_cmp/TEST-TestShardedTensorBinaryOps-20220518053013.xml (deflated 74%) 2022-05-18T05:32:12.5455246Z adding: test/test-reports/python-unittest/distributed._shard.sharded_tensor.ops.test_elementwise_ops/TEST-TestShardedTensorElementWiseOps-20220518053021.xml (deflated 68%) 2022-05-18T05:32:12.5456126Z adding: test/test-reports/python-unittest/distributed.elastic.timer.local_timer_test/TEST-LocalTimerServerTest-20220518053027.xml (deflated 71%) 2022-05-18T05:32:12.5456937Z adding: test/test-reports/python-unittest/distributed.elastic.timer.local_timer_test/TEST-LocalTimerTest-20220518053027.xml (deflated 69%) 2022-05-18T05:32:12.5457796Z adding: test/test-reports/python-unittest/distributed.elastic.timer.local_timer_test/TEST-MultiprocessingRequestQueueTest-20220518053027.xml (deflated 66%) 2022-05-18T05:32:12.5458626Z adding: test/test-reports/python-unittest/distributed.test_data_parallel/TEST-TestDataParallel-20220518053035.xml (deflated 83%) 2022-05-18T05:32:12.5459429Z adding: test/test-reports/python-unittest/distributed.test_data_parallel/TEST-TestDataParallelDeviceTypeCUDA-20220518053035.xml (deflated 85%) 2022-05-18T05:32:12.5460270Z adding: test/test-reports/python-unittest/distributed.fsdp.test_fsdp_multiple_wrapping/TEST-TestMultipleWrapping-20220518053041.xml (deflated 47%) 2022-05-18T05:32:12.5461033Z adding: test/test-reports/python-unittest/distributed.fsdp.test_fsdp_pure_fp16/TEST-TestPureFP16-20220518053047.xml (deflated 51%) 2022-05-18T05:32:12.5461826Z adding: test/test-reports/python-unittest/distributed._shard.sharded_tensor.ops.test_softmax/TEST-TestShardedSoftmax-20220518053053.xml (deflated 59%) 2022-05-18T05:32:12.5462722Z adding: test/test-reports/python-unittest/distributed._shard.sharded_tensor.test_sharded_tensor_reshard/TEST-TestReshard-20220518053058.xml (deflated 61%) 2022-05-18T05:32:12.5463544Z adding: test/test-reports/python-unittest/distributed._shard.sharded_optim.test_sharded_optim/TEST-TestShardedOptimizer-20220518053104.xml (deflated 59%) 2022-05-18T05:32:12.5464427Z adding: test/test-reports/python-unittest/distributed._shard.sharded_tensor.test_megatron_prototype/TEST-TestShardedTensorMegatronLinear-20220518053109.xml (deflated 44%) 2022-05-18T05:32:12.5465288Z adding: test/test-reports/python-unittest/distributed.test_launcher/TEST-TestDistributedLaunch-20220518053113.xml (deflated 46%) 2022-05-18T05:32:12.5466049Z adding: test/test-reports/python-unittest/distributed.elastic.utils.util_test/TEST-StoreUtilTest-20220518053117.xml (deflated 63%) 2022-05-18T05:32:12.5466794Z adding: test/test-reports/python-unittest/distributed.elastic.utils.util_test/TEST-UtilTest-20220518053117.xml (deflated 69%) 2022-05-18T05:32:12.5467537Z adding: test/test-reports/python-unittest/distributed.elastic.metrics.api_test/TEST-MetricsApiTest-20220518053120.xml (deflated 63%) 2022-05-18T05:32:12.5468274Z adding: test/test-reports/python-unittest/distributed.fsdp.test_utils/TEST-TestUtils-20220518053123.xml (deflated 68%) 2022-05-18T05:32:12.5506723Z ##[group]Run seemethere/upload-artifact-s3@v4 2022-05-18T05:32:12.5507024Z with: 2022-05-18T05:32:12.5507265Z retention-days: 14 2022-05-18T05:32:12.5507522Z if-no-files-found: warn 2022-05-18T05:32:12.5507801Z path: test-jsons-*.zip 2022-05-18T05:32:12.5508058Z name: artifact 2022-05-18T05:32:12.5508295Z s3-bucket: gha-artifacts 2022-05-18T05:32:12.5508563Z region: us-east-1 2022-05-18T05:32:12.5508798Z env: 2022-05-18T05:32:12.5508997Z IN_CI: 1 2022-05-18T05:32:12.5509221Z IS_GHA: 1 2022-05-18T05:32:12.5509472Z GIT_DEFAULT_BRANCH: master 2022-05-18T05:32:12.5509725Z GPU_FLAG: --gpus all 2022-05-18T05:32:12.5510061Z ##[endgroup] 2022-05-18T05:32:12.9665580Z With the provided path, there will be 1 file uploaded 2022-05-18T05:32:12.9666001Z Uploading to s3 prefix: pytorch/pytorch/2342799944/1/artifact 2022-05-18T05:32:12.9676199Z Starting upload of test-jsons-test-distributed-2-2-linux.8xlarge.nvidia.gpu_6482805675.zip 2022-05-18T05:32:13.1263517Z Finished upload of test-jsons-test-distributed-2-2-linux.8xlarge.nvidia.gpu_6482805675.zip 2022-05-18T05:32:13.1391922Z ##[group]Run seemethere/upload-artifact-s3@v4 2022-05-18T05:32:13.1392225Z with: 2022-05-18T05:32:13.1392455Z retention-days: 14 2022-05-18T05:32:13.1392730Z if-no-files-found: error 2022-05-18T05:32:13.1393015Z path: test-reports-*.zip 2022-05-18T05:32:13.1393260Z name: artifact 2022-05-18T05:32:13.1393515Z s3-bucket: gha-artifacts 2022-05-18T05:32:13.1393779Z region: us-east-1 2022-05-18T05:32:13.1393994Z env: 2022-05-18T05:32:13.1394210Z IN_CI: 1 2022-05-18T05:32:13.1394437Z IS_GHA: 1 2022-05-18T05:32:13.1394671Z GIT_DEFAULT_BRANCH: master 2022-05-18T05:32:13.1394962Z GPU_FLAG: --gpus all 2022-05-18T05:32:13.1395211Z ##[endgroup] 2022-05-18T05:32:13.5563447Z With the provided path, there will be 1 file uploaded 2022-05-18T05:32:13.5563883Z Uploading to s3 prefix: pytorch/pytorch/2342799944/1/artifact 2022-05-18T05:32:13.5574397Z Starting upload of test-reports-test-distributed-2-2-linux.8xlarge.nvidia.gpu_6482805675.zip 2022-05-18T05:32:13.7226372Z Finished upload of test-reports-test-distributed-2-2-linux.8xlarge.nvidia.gpu_6482805675.zip 2022-05-18T05:32:13.7362224Z ##[group]Run set -x 2022-05-18T05:32:13.7362519Z set -x 2022-05-18T05:32:13.7362829Z python3 -m pip install -r requirements.txt 2022-05-18T05:32:13.7363175Z python3 -m pip install boto3==1.19.12 2022-05-18T05:32:13.7363561Z python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test 2022-05-18T05:32:13.7377595Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2022-05-18T05:32:13.7377901Z env: 2022-05-18T05:32:13.7378227Z IN_CI: 1 2022-05-18T05:32:13.7378464Z IS_GHA: 1 2022-05-18T05:32:13.7378720Z GIT_DEFAULT_BRANCH: master 2022-05-18T05:32:13.7378996Z GPU_FLAG: --gpus all 2022-05-18T05:32:13.7379255Z AWS_DEFAULT_REGION: us-east-1 2022-05-18T05:32:13.7379523Z BRANCH: master 2022-05-18T05:32:13.7379846Z JOB_BASE_NAME: linux-xenial-cuda11.3-py3.7-gcc7-test 2022-05-18T05:32:13.7380158Z TEST_CONFIG: distributed 2022-05-18T05:32:13.7380418Z SHARD_NUMBER: 2 2022-05-18T05:32:13.7380759Z BUILD_ENVIRONMENT: linux-xenial-cuda11.3-py3.7-gcc7 2022-05-18T05:32:13.7381079Z PR_NUMBER: 2022-05-18T05:32:13.7381369Z SHA1: 3b2375291aab7b48442f2e6fb1ef66cebc761e24 2022-05-18T05:32:13.7381650Z TAG: 2022-05-18T05:32:13.7381870Z WORKFLOW_ID: 2342799944 2022-05-18T05:32:13.7382305Z GITHUB_TOKEN: *** 2022-05-18T05:32:13.7382583Z GHA_WORKFLOW_JOB_ID: 6482805675 2022-05-18T05:32:13.7382829Z ##[endgroup] 2022-05-18T05:32:13.7412835Z + python3 -m pip install -r requirements.txt 2022-05-18T05:32:14.0317490Z Defaulting to user installation because normal site-packages is not writeable 2022-05-18T05:32:14.0627562Z Ignoring dataclasses: markers 'python_version < "3.7"' don't match your environment 2022-05-18T05:32:14.0630891Z Requirement already satisfied: astunparse in /home/ec2-user/.local/lib/python3.7/site-packages (from -r requirements.txt (line 2)) (1.6.3) 2022-05-18T05:32:14.0667054Z Requirement already satisfied: expecttest in /home/ec2-user/.local/lib/python3.7/site-packages (from -r requirements.txt (line 3)) (0.1.3) 2022-05-18T05:32:14.0677734Z Requirement already satisfied: future in /home/ec2-user/.local/lib/python3.7/site-packages (from -r requirements.txt (line 4)) (0.18.2) 2022-05-18T05:32:14.0688977Z Requirement already satisfied: numpy in /home/ec2-user/.local/lib/python3.7/site-packages (from -r requirements.txt (line 5)) (1.21.6) 2022-05-18T05:32:14.0700701Z Requirement already satisfied: psutil in /home/ec2-user/.local/lib/python3.7/site-packages (from -r requirements.txt (line 6)) (5.9.0) 2022-05-18T05:32:14.0836516Z Requirement already satisfied: pyyaml in /home/ec2-user/.local/lib/python3.7/site-packages (from -r requirements.txt (line 7)) (6.0) 2022-05-18T05:32:14.0847062Z Requirement already satisfied: requests in /home/ec2-user/.local/lib/python3.7/site-packages (from -r requirements.txt (line 8)) (2.26.0) 2022-05-18T05:32:14.1012578Z Requirement already satisfied: setuptools in /usr/lib/python3.7/site-packages (from -r requirements.txt (line 9)) (49.1.3) 2022-05-18T05:32:14.1258709Z Requirement already satisfied: six in /home/ec2-user/.local/lib/python3.7/site-packages (from -r requirements.txt (line 10)) (1.16.0) 2022-05-18T05:32:14.1270156Z Requirement already satisfied: types-dataclasses in /home/ec2-user/.local/lib/python3.7/site-packages (from -r requirements.txt (line 11)) (0.6.5) 2022-05-18T05:32:14.1277826Z Requirement already satisfied: typing_extensions in /home/ec2-user/.local/lib/python3.7/site-packages (from -r requirements.txt (line 12)) (4.2.0) 2022-05-18T05:32:14.1291774Z Requirement already satisfied: wheel<1.0,>=0.23.0 in /home/ec2-user/.local/lib/python3.7/site-packages (from astunparse->-r requirements.txt (line 2)) (0.37.1) 2022-05-18T05:32:14.1324894Z Requirement already satisfied: urllib3<1.27,>=1.21.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from requests->-r requirements.txt (line 8)) (1.26.9) 2022-05-18T05:32:14.1611153Z Requirement already satisfied: certifi>=2017.4.17 in /home/ec2-user/.local/lib/python3.7/site-packages (from requests->-r requirements.txt (line 8)) (2021.10.8) 2022-05-18T05:32:14.1621835Z Requirement already satisfied: charset-normalizer~=2.0.0; python_version >= "3" in /home/ec2-user/.local/lib/python3.7/site-packages (from requests->-r requirements.txt (line 8)) (2.0.12) 2022-05-18T05:32:14.1646992Z Requirement already satisfied: idna<4,>=2.5; python_version >= "3" in /home/ec2-user/.local/lib/python3.7/site-packages (from requests->-r requirements.txt (line 8)) (3.3) 2022-05-18T05:32:14.2256717Z + python3 -m pip install boto3==1.19.12 2022-05-18T05:32:14.5107944Z Defaulting to user installation because normal site-packages is not writeable 2022-05-18T05:32:14.5318225Z Requirement already satisfied: boto3==1.19.12 in /home/ec2-user/.local/lib/python3.7/site-packages (1.19.12) 2022-05-18T05:32:14.5388111Z Requirement already satisfied: botocore<1.23.0,>=1.22.12 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (1.22.12) 2022-05-18T05:32:14.5449409Z Requirement already satisfied: s3transfer<0.6.0,>=0.5.0 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.5.2) 2022-05-18T05:32:14.5484399Z Requirement already satisfied: jmespath<1.0.0,>=0.7.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from boto3==1.19.12) (0.10.0) 2022-05-18T05:32:14.5501893Z Requirement already satisfied: python-dateutil<3.0.0,>=2.1 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (2.8.2) 2022-05-18T05:32:14.5536271Z Requirement already satisfied: urllib3<1.27,>=1.25.4 in /home/ec2-user/.local/lib/python3.7/site-packages (from botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.26.9) 2022-05-18T05:32:14.5747400Z Requirement already satisfied: six>=1.5 in /home/ec2-user/.local/lib/python3.7/site-packages (from python-dateutil<3.0.0,>=2.1->botocore<1.23.0,>=1.22.12->boto3==1.19.12) (1.16.0) 2022-05-18T05:32:14.6867085Z + python3 -m tools.stats.print_test_stats --upload-to-s3 --compare-with-s3 test 2022-05-18T05:32:18.8585083Z [scribe] Scribe access token not provided, sending report via boto3... 2022-05-18T05:32:18.8585604Z 2022-05-18T05:32:18.8585958Z ----- Historic stats comparison result ------ 2022-05-18T05:32:18.8586171Z 2022-05-18T05:32:18.8586409Z job: linux-xenial-cuda11.3-py3.7-gcc7-test 2022-05-18T05:32:18.8586781Z commit: 3b2375291aab7b48442f2e6fb1ef66cebc761e24 2022-05-18T05:32:18.8586983Z 2022-05-18T05:32:18.8587197Z Commit graph (base is most recent master ancestor with at least one S3 report): 2022-05-18T05:32:18.8587444Z 2022-05-18T05:32:18.8587557Z : (master) 2022-05-18T05:32:18.8587765Z | 2022-05-18T05:32:18.8588041Z * 3b2375291a (HEAD) total time 3658.29s 2022-05-18T05:32:18.8588632Z * 6e3391a7c3 (base) 4 reports, total time 3593.15s ± 212.32s 2022-05-18T05:32:18.8589063Z * 48581d74ad 4 reports, total time 3597.75s ± 289.49s 2022-05-18T05:32:18.8589504Z * c35bd8d423 2 reports, total time 2907.31s ± 734.98s 2022-05-18T05:32:18.8589978Z * f6beda89c6 6 reports, total time 3022.28s ± 967.61s 2022-05-18T05:32:18.8590404Z * ee080918df 6 reports, total time 3423.34s ± 1100.01s 2022-05-18T05:32:18.8590711Z * bbaefdf6b5 0 reports 2022-05-18T05:32:18.8590981Z * 7c52f204e0 0 reports 2022-05-18T05:32:18.8591245Z * e0451d8022 0 reports 2022-05-18T05:32:18.8591614Z * 4e2f5507d0 6 reports, total time 3460.20s ± 1126.45s 2022-05-18T05:32:18.8592033Z * b64845eb18 6 reports, total time 3428.22s ± 1095.49s 2022-05-18T05:32:18.8592315Z | 2022-05-18T05:32:18.8592521Z : 2022-05-18T05:32:18.8592659Z 2022-05-18T05:32:18.8592830Z Removed (across 292 suites) 0 tests, totaling 0.00s 2022-05-18T05:32:18.8593185Z Modified (across 0 suites) 0 tests, totaling 0.00s 2022-05-18T05:32:18.8593522Z Added (across 76 suites) 1035 tests, totaling +3658.29s 2022-05-18T05:32:18.9113457Z Prepare all required actions 2022-05-18T05:32:18.9136514Z ##[group]Run ./.github/actions/teardown-linux 2022-05-18T05:32:18.9136804Z with: 2022-05-18T05:32:18.9137002Z env: 2022-05-18T05:32:18.9137221Z IN_CI: 1 2022-05-18T05:32:18.9137450Z IS_GHA: 1 2022-05-18T05:32:18.9137683Z GIT_DEFAULT_BRANCH: master 2022-05-18T05:32:18.9137950Z GPU_FLAG: --gpus all 2022-05-18T05:32:18.9138200Z ##[endgroup] 2022-05-18T05:32:18.9155694Z ##[group]Run .github/scripts/wait_for_ssh_to_drain.sh 2022-05-18T05:32:18.9156059Z .github/scripts/wait_for_ssh_to_drain.sh 2022-05-18T05:32:18.9169554Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2022-05-18T05:32:18.9169996Z env: 2022-05-18T05:32:18.9170220Z IN_CI: 1 2022-05-18T05:32:18.9170607Z IS_GHA: 1 2022-05-18T05:32:18.9170867Z GIT_DEFAULT_BRANCH: master 2022-05-18T05:32:18.9171138Z GPU_FLAG: --gpus all 2022-05-18T05:32:18.9171371Z ##[endgroup] 2022-05-18T05:32:18.9215982Z Holding runner for 2 hours until all ssh sessions have logged out 2022-05-18T05:32:18.9264333Z ##[group]Run # ignore expansion of "docker ps -q" since it could be empty 2022-05-18T05:32:18.9264777Z # ignore expansion of "docker ps -q" since it could be empty 2022-05-18T05:32:18.9265108Z # shellcheck disable=SC2046 2022-05-18T05:32:18.9265428Z docker stop $(docker ps -q) || true 2022-05-18T05:32:18.9265750Z # Prune all of the docker images 2022-05-18T05:32:18.9266037Z docker system prune -af 2022-05-18T05:32:18.9278278Z shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0} 2022-05-18T05:32:18.9278589Z env: 2022-05-18T05:32:18.9278806Z IN_CI: 1 2022-05-18T05:32:18.9279037Z IS_GHA: 1 2022-05-18T05:32:18.9279296Z GIT_DEFAULT_BRANCH: master 2022-05-18T05:32:18.9279551Z GPU_FLAG: --gpus all 2022-05-18T05:32:18.9279808Z ##[endgroup] 2022-05-18T05:32:19.2840630Z ee34c49c9c62 2022-05-18T05:32:19.7880836Z Deleted Containers: 2022-05-18T05:32:19.7881264Z ee34c49c9c62c22ef7a6ae17e6a604c5c0073de7fa7971bf62d1b5af644989c0 2022-05-18T05:32:19.7881529Z 2022-05-18T05:32:24.2981700Z Deleted Images: 2022-05-18T05:32:24.2982581Z untagged: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7:6deab82db6a72ca54cd3e3322ee4f13864536734 2022-05-18T05:32:24.2983608Z untagged: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda11.3-cudnn8-py3-gcc7@sha256:66b56fbc2d0d8bf75af01c4976aba15f28c9802507dc01f27e71a55f8ffc13e0 2022-05-18T05:32:24.2984236Z deleted: sha256:236fb78bf4f994bc51deea39a9f1233f16926c9987665659eb61a9b813bac802 2022-05-18T05:32:24.2984694Z deleted: sha256:b9698d2c371e7954d7e15ccafd1fef13432b7e600eae532a3030f303baec803d 2022-05-18T05:32:24.2985155Z deleted: sha256:aa93f12269ac15489a70d1942bf2ab9744530e9058a09fd7d710c3eea67feb3f 2022-05-18T05:32:24.2985594Z deleted: sha256:b9bc18d2b222e195933fb238ac84e3e03734a45e81528f1753ddff97b9c72a43 2022-05-18T05:32:24.2986017Z deleted: sha256:80ca15a8657ced4067141453e7480c432fcdca438f0497640923838dea73ce43 2022-05-18T05:32:24.2986439Z deleted: sha256:f8866d582b5e8f7e1500d74db64487ea732384350859643a9b2abebb4573e705 2022-05-18T05:32:24.2986866Z deleted: sha256:32eb71a8af1989f422d1ef4e96915a0c579851537919e8b5dcdd98a59eab5b10 2022-05-18T05:32:24.2987296Z deleted: sha256:b93ecf6274d2fb0d04d3f59b9c1fbf6bfdf45a37acc7c109a522f192740766dd 2022-05-18T05:32:24.2987757Z deleted: sha256:8a6e85c33391306e1c7f415daeecf5998a12666c9afcfbeff15218240d0b53f3 2022-05-18T05:32:24.2988193Z deleted: sha256:991307397ef5ab4552d4a5f9293bd89db5bf7682892c0fdcea9bdfec12cd577e 2022-05-18T05:32:24.2988633Z deleted: sha256:1fa50cb91821cfcda4c803fdac009cc768715050e77c6b1266a3413a72f1d649 2022-05-18T05:32:24.2989077Z deleted: sha256:1a97642bd09c887f28cafcf79d1b815c3aa17a2bafacfc0bd7e934c709804b24 2022-05-18T05:32:24.2989539Z deleted: sha256:f3e369c2c977f35de33f7cbe9dbebf1bde8f63488c24b1a916883c49e512b3dd 2022-05-18T05:32:24.2989989Z deleted: sha256:2af47fb560e6db2934a930d03dc373cd29eaf5567afc06954d03b685020af41b 2022-05-18T05:32:24.2990615Z deleted: sha256:f906967fe7d2f8e2cd2fc3856d68bb8b1ae478cb816465468433f3eb48776dab 2022-05-18T05:32:24.2991085Z deleted: sha256:62b1fc0017c5b9541319bb613a6635ef68c0bee21ea46b358cfd629f18902d18 2022-05-18T05:32:24.2991521Z deleted: sha256:d58a6e6766e5fc8073d498afe668a52137146de79e2be473de919412cfb0c5c5 2022-05-18T05:32:24.2991954Z deleted: sha256:61347a34ee15c4d9cca6d181c32640066ba3456f9b9eb00843c55da3108fdc53 2022-05-18T05:32:24.2992369Z deleted: sha256:b0368f26a63aa7f28f7099b397d015f17d1c8785aec8dc11f315b7911912588d 2022-05-18T05:32:24.2992809Z deleted: sha256:6c5d0b2c8ac319699b8d86fc3f073fd7edb990189995fa9b35abece668fbb2d2 2022-05-18T05:32:24.2993390Z deleted: sha256:7eef87c7b27adca5d7956bdc256c2b9c386896dd2f9142cb2c65601315b885a4 2022-05-18T05:32:24.2993820Z deleted: sha256:bad3c554c61dc113ef9996c78f018493bef1c51b614c489258964beb48a212d1 2022-05-18T05:32:24.2994263Z deleted: sha256:4f80a663751e4147a2bcbb9fdaa829c498eb6c18f8e8218502d5a2f868b75d73 2022-05-18T05:32:24.2994714Z deleted: sha256:98ed9491a363c4fd7ebeaa38caf6c4216ad5032d9415ce7abdc39b6149a5ce4c 2022-05-18T05:32:24.2995157Z deleted: sha256:4c71586cb548b8efa3884f847e3dfffb6514648d8131654de3ae8c3165a935c0 2022-05-18T05:32:24.2995565Z deleted: sha256:3365026f6ab431e63e4c2b8b3b4ea77640968a1031c71e229be682a7e6865992 2022-05-18T05:32:24.2996008Z deleted: sha256:6f9d8a5abe7610cc746fabddd5fe5de01ac404017fc5a4f3009a83caec28f771 2022-05-18T05:32:24.2996444Z deleted: sha256:943055dd5d358663a978746c9e41e2c2053d132f045219c20a164176f6228da4 2022-05-18T05:32:24.2996842Z deleted: sha256:675297863e1d26af6fd4cd4550d3aadf811d9e31d4e2f53939ac0de9d13a6447 2022-05-18T05:32:24.2997279Z deleted: sha256:535e5f66a4c0f3149aa7327c8fd55b2a4aab80016c4639571c5268cbbe006ef6 2022-05-18T05:32:24.2997715Z deleted: sha256:4ba20979a69fdebf50a0fcea34ae16be95d9556b6b2e9656387d450204a32739 2022-05-18T05:32:24.2998153Z deleted: sha256:d538786f7a446ecb0b2d077dae03561ad192e00337811bb1a23286d2ff720889 2022-05-18T05:32:24.2998564Z deleted: sha256:3691b15bb27dde54c31442622d1c70d6c31b8fc91576c15a4d031a7390b7c9d7 2022-05-18T05:32:24.2999003Z deleted: sha256:51f052fa0c9fef786ecdeca36b244e1d756793d5ac3c9eab25183237ccad7c44 2022-05-18T05:32:24.2999457Z deleted: sha256:29d62475fed7ed1c35cef64d114cf77e111f7f06ac5f861d3e15ab1423ba403c 2022-05-18T05:32:24.2999876Z deleted: sha256:e64ff51bd9cf74d567736df27f09a71215054d7a6fd387fb8a1525863bea2965 2022-05-18T05:32:24.3000314Z deleted: sha256:8ce2ebc2b9122185e0bf1b7079903ad8839f5dd8b74b422833ed4f32a21786ec 2022-05-18T05:32:24.3000748Z deleted: sha256:c1f0e89774fc81730998b64c9e2cd56d0e5fd033b100a0f74a1b9bddca647997 2022-05-18T05:32:24.3001182Z deleted: sha256:045405eee158f603e8d2840c5696f23bff982b34dea8ce059497806acacc6891 2022-05-18T05:32:24.3001948Z deleted: sha256:ca4847070736f4c4ca7e5075c715e4845d0a30a41aab34f473e0753094e5ebf0 2022-05-18T05:32:24.3002707Z deleted: sha256:89767b0be2e7dda030b2a7bbd1df9bd63bbd13e0737b8b5ee3b0643897b36459 2022-05-18T05:32:24.3003165Z deleted: sha256:739ba1ab17ff0d1e81e1ac36c75472c62975342fa8fe8993206dd31f5780e105 2022-05-18T05:32:24.3003606Z deleted: sha256:da279ce0cf78b3443ee69f3308ecdcfa27525db93b9243e7f7c01ee80da21bd5 2022-05-18T05:32:24.3004065Z deleted: sha256:88f543990c97cd012b9e17b81fd42bff0dff5a06c70187fbc19b1860a6604b96 2022-05-18T05:32:24.3004507Z deleted: sha256:d4c156eabe2ffb174bb8b81474a3551cc41b23647c1448f33a07162a05bcb6d1 2022-05-18T05:32:24.3004947Z deleted: sha256:3823e0dd401f484d0f5471862f350ef88ffd14ecb5f1bd7329f4b6902192a905 2022-05-18T05:32:24.3005355Z deleted: sha256:1d87640a243e42970325889dd0d6ca21c6fc3c50efac95d88624ad1463d2f9a0 2022-05-18T05:32:24.3005768Z deleted: sha256:7423922c27fd43adc890485039d039307be042bd004ce39a462d7f8ee969125b 2022-05-18T05:32:24.3006200Z deleted: sha256:18b3010e02831349f67561e26c28fbace9501706ea0780b77339475581c2e40e 2022-05-18T05:32:24.3006590Z deleted: sha256:0214f4b057d78b44fd12702828152f67c0ce115f9346acc63acdf997cab7e7c8 2022-05-18T05:32:24.3007001Z deleted: sha256:1b9d0485372c5562fa614d5b35766f6c442539bcee9825a6e90d1158c3299a61 2022-05-18T05:32:24.3007512Z deleted: sha256:3c0f34be6eb98057c607b9080237cce0be0b86f52d51ba620dc018a3d421baea 2022-05-18T05:32:24.3007962Z deleted: sha256:be96a3f634de79f523f07c7e4e0216c28af45eb5776e7a6238a2392f71e01069 2022-05-18T05:32:24.3008215Z 2022-05-18T05:32:24.3008333Z Total reclaimed space: 15.91GB 2022-05-18T05:32:24.3064908Z Post job cleanup. 2022-05-18T05:32:24.3100781Z Post job cleanup. 2022-05-18T05:32:24.4425772Z [command]/usr/bin/git version 2022-05-18T05:32:24.4474330Z git version 2.32.0 2022-05-18T05:32:24.4539263Z Temporarily overriding HOME='/home/ec2-user/actions-runner/_work/_temp/04b2a092-26db-4a2c-a9f2-9f1c640282fe' before making global git config changes 2022-05-18T05:32:24.4540018Z Adding repository directory to the temporary git global config as a safe directory 2022-05-18T05:32:24.4548775Z [command]/usr/bin/git config --global --add safe.directory /home/ec2-user/actions-runner/_work/pytorch/pytorch 2022-05-18T05:32:24.4595527Z [command]/usr/bin/git config --local --name-only --get-regexp core\.sshCommand 2022-05-18T05:32:24.4634874Z [command]/usr/bin/git submodule foreach --recursive git config --local --name-only --get-regexp 'core\.sshCommand' && git config --local --unset-all 'core.sshCommand' || : 2022-05-18T05:32:24.4963453Z Entering 'android/libs/fbjni' 2022-05-18T05:32:24.5004140Z Entering 'third_party/FP16' 2022-05-18T05:32:24.5046597Z Entering 'third_party/FXdiv' 2022-05-18T05:32:24.5085627Z Entering 'third_party/NNPACK' 2022-05-18T05:32:24.5125883Z Entering 'third_party/QNNPACK' 2022-05-18T05:32:24.5166720Z Entering 'third_party/XNNPACK' 2022-05-18T05:32:24.5219317Z Entering 'third_party/benchmark' 2022-05-18T05:32:24.5260502Z Entering 'third_party/cpuinfo' 2022-05-18T05:32:24.5302275Z Entering 'third_party/cub' 2022-05-18T05:32:24.5345326Z Entering 'third_party/cudnn_frontend' 2022-05-18T05:32:24.5392236Z Entering 'third_party/eigen' 2022-05-18T05:32:24.5435904Z Entering 'third_party/fbgemm' 2022-05-18T05:32:24.5477543Z Entering 'third_party/fbgemm/third_party/asmjit' 2022-05-18T05:32:24.5518567Z Entering 'third_party/fbgemm/third_party/cpuinfo' 2022-05-18T05:32:24.5560287Z Entering 'third_party/fbgemm/third_party/googletest' 2022-05-18T05:32:24.5602071Z Entering 'third_party/flatbuffers' 2022-05-18T05:32:24.5646806Z Entering 'third_party/fmt' 2022-05-18T05:32:24.5687046Z Entering 'third_party/foxi' 2022-05-18T05:32:24.5727240Z Entering 'third_party/gemmlowp/gemmlowp' 2022-05-18T05:32:24.5768029Z Entering 'third_party/gloo' 2022-05-18T05:32:24.5807941Z Entering 'third_party/googletest' 2022-05-18T05:32:24.5849606Z Entering 'third_party/ideep' 2022-05-18T05:32:24.5890032Z Entering 'third_party/ideep/mkl-dnn' 2022-05-18T05:32:24.5933890Z Entering 'third_party/ideep/mkl-dnn/third_party/oneDNN' 2022-05-18T05:32:24.5981508Z Entering 'third_party/ios-cmake' 2022-05-18T05:32:24.6021834Z Entering 'third_party/kineto' 2022-05-18T05:32:24.6064350Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2022-05-18T05:32:24.6105834Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2022-05-18T05:32:24.6147479Z Entering 'third_party/nccl/nccl' 2022-05-18T05:32:24.6188794Z Entering 'third_party/neon2sse' 2022-05-18T05:32:24.6230346Z Entering 'third_party/onnx' 2022-05-18T05:32:24.6282482Z Entering 'third_party/onnx/third_party/benchmark' 2022-05-18T05:32:24.6323401Z Entering 'third_party/onnx/third_party/pybind11' 2022-05-18T05:32:24.6366541Z Entering 'third_party/onnx-tensorrt' 2022-05-18T05:32:24.6406994Z Entering 'third_party/onnx-tensorrt/third_party/onnx' 2022-05-18T05:32:24.6452771Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/benchmark' 2022-05-18T05:32:24.6493678Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11' 2022-05-18T05:32:24.6535530Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11/tools/clang' 2022-05-18T05:32:24.6580130Z Entering 'third_party/pocketfft' 2022-05-18T05:32:24.6620497Z Entering 'third_party/protobuf' 2022-05-18T05:32:24.6664809Z Entering 'third_party/protobuf/third_party/benchmark' 2022-05-18T05:32:24.6704176Z Entering 'third_party/protobuf/third_party/googletest' 2022-05-18T05:32:24.6747478Z Entering 'third_party/psimd' 2022-05-18T05:32:24.6789014Z Entering 'third_party/pthreadpool' 2022-05-18T05:32:24.6829803Z Entering 'third_party/pybind11' 2022-05-18T05:32:24.6872201Z Entering 'third_party/python-enum' 2022-05-18T05:32:24.6913445Z Entering 'third_party/python-peachpy' 2022-05-18T05:32:24.6955495Z Entering 'third_party/python-six' 2022-05-18T05:32:24.6997684Z Entering 'third_party/sleef' 2022-05-18T05:32:24.7039119Z Entering 'third_party/tbb' 2022-05-18T05:32:24.7082418Z Entering 'third_party/tensorpipe' 2022-05-18T05:32:24.7123568Z Entering 'third_party/tensorpipe/third_party/googletest' 2022-05-18T05:32:24.7164512Z Entering 'third_party/tensorpipe/third_party/libnop' 2022-05-18T05:32:24.7205137Z Entering 'third_party/tensorpipe/third_party/libuv' 2022-05-18T05:32:24.7245721Z Entering 'third_party/tensorpipe/third_party/pybind11' 2022-05-18T05:32:24.7286800Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2022-05-18T05:32:24.7330201Z Entering 'third_party/zstd' 2022-05-18T05:32:24.7391685Z [command]/usr/bin/git config --local --name-only --get-regexp http\.https\:\/\/github\.com\/\.extraheader 2022-05-18T05:32:24.7420766Z http.https://github.com/.extraheader 2022-05-18T05:32:24.7431562Z [command]/usr/bin/git config --local --unset-all http.https://github.com/.extraheader 2022-05-18T05:32:24.7470724Z [command]/usr/bin/git submodule foreach --recursive git config --local --name-only --get-regexp 'http\.https\:\/\/github\.com\/\.extraheader' && git config --local --unset-all 'http.https://github.com/.extraheader' || : 2022-05-18T05:32:24.7789200Z Entering 'android/libs/fbjni' 2022-05-18T05:32:24.7813428Z http.https://github.com/.extraheader 2022-05-18T05:32:24.7844838Z Entering 'third_party/FP16' 2022-05-18T05:32:24.7869917Z http.https://github.com/.extraheader 2022-05-18T05:32:24.7901377Z Entering 'third_party/FXdiv' 2022-05-18T05:32:24.7925171Z http.https://github.com/.extraheader 2022-05-18T05:32:24.7957020Z Entering 'third_party/NNPACK' 2022-05-18T05:32:24.7982578Z http.https://github.com/.extraheader 2022-05-18T05:32:24.8014747Z Entering 'third_party/QNNPACK' 2022-05-18T05:32:24.8038556Z http.https://github.com/.extraheader 2022-05-18T05:32:24.8071221Z Entering 'third_party/XNNPACK' 2022-05-18T05:32:24.8095438Z http.https://github.com/.extraheader 2022-05-18T05:32:24.8137796Z Entering 'third_party/benchmark' 2022-05-18T05:32:24.8162162Z http.https://github.com/.extraheader 2022-05-18T05:32:24.8193624Z Entering 'third_party/cpuinfo' 2022-05-18T05:32:24.8217783Z http.https://github.com/.extraheader 2022-05-18T05:32:24.8249846Z Entering 'third_party/cub' 2022-05-18T05:32:24.8274364Z http.https://github.com/.extraheader 2022-05-18T05:32:24.8306032Z Entering 'third_party/cudnn_frontend' 2022-05-18T05:32:24.8330223Z http.https://github.com/.extraheader 2022-05-18T05:32:24.8367435Z Entering 'third_party/eigen' 2022-05-18T05:32:24.8392000Z http.https://github.com/.extraheader 2022-05-18T05:32:24.8426792Z Entering 'third_party/fbgemm' 2022-05-18T05:32:24.8451242Z http.https://github.com/.extraheader 2022-05-18T05:32:24.8481914Z Entering 'third_party/fbgemm/third_party/asmjit' 2022-05-18T05:32:24.8506639Z http.https://github.com/.extraheader 2022-05-18T05:32:24.8537896Z Entering 'third_party/fbgemm/third_party/cpuinfo' 2022-05-18T05:32:24.8561338Z http.https://github.com/.extraheader 2022-05-18T05:32:24.8593883Z Entering 'third_party/fbgemm/third_party/googletest' 2022-05-18T05:32:24.8617935Z http.https://github.com/.extraheader 2022-05-18T05:32:24.8649831Z Entering 'third_party/flatbuffers' 2022-05-18T05:32:24.8674079Z http.https://github.com/.extraheader 2022-05-18T05:32:24.8708644Z Entering 'third_party/fmt' 2022-05-18T05:32:24.8733003Z http.https://github.com/.extraheader 2022-05-18T05:32:24.8763802Z Entering 'third_party/foxi' 2022-05-18T05:32:24.8788187Z http.https://github.com/.extraheader 2022-05-18T05:32:24.8819862Z Entering 'third_party/gemmlowp/gemmlowp' 2022-05-18T05:32:24.8843481Z http.https://github.com/.extraheader 2022-05-18T05:32:24.8876979Z Entering 'third_party/gloo' 2022-05-18T05:32:24.8902762Z http.https://github.com/.extraheader 2022-05-18T05:32:24.8934936Z Entering 'third_party/googletest' 2022-05-18T05:32:24.8959151Z http.https://github.com/.extraheader 2022-05-18T05:32:24.8992838Z Entering 'third_party/ideep' 2022-05-18T05:32:24.9017560Z http.https://github.com/.extraheader 2022-05-18T05:32:24.9047693Z Entering 'third_party/ideep/mkl-dnn' 2022-05-18T05:32:24.9072695Z http.https://github.com/.extraheader 2022-05-18T05:32:24.9106400Z Entering 'third_party/ideep/mkl-dnn/third_party/oneDNN' 2022-05-18T05:32:24.9129942Z http.https://github.com/.extraheader 2022-05-18T05:32:24.9168977Z Entering 'third_party/ios-cmake' 2022-05-18T05:32:24.9194285Z http.https://github.com/.extraheader 2022-05-18T05:32:24.9225805Z Entering 'third_party/kineto' 2022-05-18T05:32:24.9250332Z http.https://github.com/.extraheader 2022-05-18T05:32:24.9282580Z Entering 'third_party/kineto/libkineto/third_party/fmt' 2022-05-18T05:32:24.9308167Z http.https://github.com/.extraheader 2022-05-18T05:32:24.9340607Z Entering 'third_party/kineto/libkineto/third_party/googletest' 2022-05-18T05:32:24.9364202Z http.https://github.com/.extraheader 2022-05-18T05:32:24.9398463Z Entering 'third_party/nccl/nccl' 2022-05-18T05:32:24.9423514Z http.https://github.com/.extraheader 2022-05-18T05:32:24.9455279Z Entering 'third_party/neon2sse' 2022-05-18T05:32:24.9479710Z http.https://github.com/.extraheader 2022-05-18T05:32:24.9511662Z Entering 'third_party/onnx' 2022-05-18T05:32:24.9536275Z http.https://github.com/.extraheader 2022-05-18T05:32:24.9580427Z Entering 'third_party/onnx/third_party/benchmark' 2022-05-18T05:32:24.9604538Z http.https://github.com/.extraheader 2022-05-18T05:32:24.9637277Z Entering 'third_party/onnx/third_party/pybind11' 2022-05-18T05:32:24.9661981Z http.https://github.com/.extraheader 2022-05-18T05:32:24.9696520Z Entering 'third_party/onnx-tensorrt' 2022-05-18T05:32:24.9720270Z http.https://github.com/.extraheader 2022-05-18T05:32:24.9751540Z Entering 'third_party/onnx-tensorrt/third_party/onnx' 2022-05-18T05:32:24.9776394Z http.https://github.com/.extraheader 2022-05-18T05:32:24.9813747Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/benchmark' 2022-05-18T05:32:24.9837200Z http.https://github.com/.extraheader 2022-05-18T05:32:24.9869617Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11' 2022-05-18T05:32:24.9894372Z http.https://github.com/.extraheader 2022-05-18T05:32:24.9925520Z Entering 'third_party/onnx-tensorrt/third_party/onnx/third_party/pybind11/tools/clang' 2022-05-18T05:32:24.9950562Z http.https://github.com/.extraheader 2022-05-18T05:32:24.9987422Z Entering 'third_party/pocketfft' 2022-05-18T05:32:25.0011554Z http.https://github.com/.extraheader 2022-05-18T05:32:25.0043297Z Entering 'third_party/protobuf' 2022-05-18T05:32:25.0068175Z http.https://github.com/.extraheader 2022-05-18T05:32:25.0104061Z Entering 'third_party/protobuf/third_party/benchmark' 2022-05-18T05:32:25.0128047Z http.https://github.com/.extraheader 2022-05-18T05:32:25.0159560Z Entering 'third_party/protobuf/third_party/googletest' 2022-05-18T05:32:25.0185206Z http.https://github.com/.extraheader 2022-05-18T05:32:25.0219384Z Entering 'third_party/psimd' 2022-05-18T05:32:25.0243381Z http.https://github.com/.extraheader 2022-05-18T05:32:25.0275144Z Entering 'third_party/pthreadpool' 2022-05-18T05:32:25.0300451Z http.https://github.com/.extraheader 2022-05-18T05:32:25.0332137Z Entering 'third_party/pybind11' 2022-05-18T05:32:25.0356058Z http.https://github.com/.extraheader 2022-05-18T05:32:25.0387900Z Entering 'third_party/python-enum' 2022-05-18T05:32:25.0413017Z http.https://github.com/.extraheader 2022-05-18T05:32:25.0444155Z Entering 'third_party/python-peachpy' 2022-05-18T05:32:25.0469483Z http.https://github.com/.extraheader 2022-05-18T05:32:25.0502290Z Entering 'third_party/python-six' 2022-05-18T05:32:25.0525859Z http.https://github.com/.extraheader 2022-05-18T05:32:25.0557494Z Entering 'third_party/sleef' 2022-05-18T05:32:25.0582631Z http.https://github.com/.extraheader 2022-05-18T05:32:25.0614674Z Entering 'third_party/tbb' 2022-05-18T05:32:25.0638527Z http.https://github.com/.extraheader 2022-05-18T05:32:25.0672339Z Entering 'third_party/tensorpipe' 2022-05-18T05:32:25.0697414Z http.https://github.com/.extraheader 2022-05-18T05:32:25.0728581Z Entering 'third_party/tensorpipe/third_party/googletest' 2022-05-18T05:32:25.0752907Z http.https://github.com/.extraheader 2022-05-18T05:32:25.0785389Z Entering 'third_party/tensorpipe/third_party/libnop' 2022-05-18T05:32:25.0809115Z http.https://github.com/.extraheader 2022-05-18T05:32:25.0841231Z Entering 'third_party/tensorpipe/third_party/libuv' 2022-05-18T05:32:25.0865685Z http.https://github.com/.extraheader 2022-05-18T05:32:25.0898172Z Entering 'third_party/tensorpipe/third_party/pybind11' 2022-05-18T05:32:25.0921870Z http.https://github.com/.extraheader 2022-05-18T05:32:25.0953155Z Entering 'third_party/tensorpipe/third_party/pybind11/tools/clang' 2022-05-18T05:32:25.0977482Z http.https://github.com/.extraheader 2022-05-18T05:32:25.1012352Z Entering 'third_party/zstd' 2022-05-18T05:32:25.1036288Z http.https://github.com/.extraheader 2022-05-18T05:32:25.1341525Z Cleaning up orphan processes